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ABSTRACT 

One of 15 core modules in a 22-module series designed 
to train vocational education curriculum specialists (VECS) , this 
guide is intended for use by both instructor and student in a variety 
of education environments, including independent study, team 
teaching, seminars, and workshops, as well as in more conventional 
classroom settings. The guide has five major sections. Part I, 
Organization and Administration, contains an overview and rationale, 
educational goals and performance objectives, recommended learning 
materials, and suggested reference materials. Part II, Content and 
Study Activities, contains the content outline arranged by goals* 
Study activities for each goal and its corresponding objectives 
follow each section of the content outline. Content focus is on the 
concept of criterion-referenced measurement within the framework of 
educational evaluation^ selecting approaches/techniques for assessing 
student achievement of instructional objectives in the three domains 
of learning, and dev .oping an evaluation plan and constructing test 
instruments for measuring student achievement of instructional 
objectives. Part III, Group and Classroom Activities, suggests 
classroom or group activities cind discussions keyed to specific 
content in the outline and to specific materials in the list of 
references. Part IV. Student Self-Check, contains questions directly 
related to the goals and objec^-ives of the module, wh^ch may be used 
as a pretest or posttest. Par- V, Appendix, contains suggested 
responses to the study activi -r from part II and responses to the 
student self-checks. (HD) 
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PREFACE 



Who is a vocational education curriculum specialist? The answer 
to this question is not as simple as it might appear. A vocational 
education curriculum specialist is likely to work in many different 
capacities, including, but not limited to: instructor, department 
chairperson, dean of vocational -technical education, vocational super- 
visor, principal, state or local director of vocational education, and 
curriculum coordinator. 

The specialist is, perhaps, more identifiable by his/her respon- 
sibilities, which include, but are not limited to: 

• planning, organizing, actualizing, and controlling the work 
of an educational team performed to determine and achieve 
object ives.. 

• planning, organizing, and evaluating content and learning 
processes into sequential activities that facilitate the 
achievement of objectives. 

• diagnosing present and projected training need? of business, 
industry, educational institutions, and the learner. 

• knowing, comparing, and analyzing different theories of curric- 
ulum development, management, and evaluation and adapting them 
for use in vocational -technical education. 

This teaching/learning module is part of a set of materials repre- 
senting a comprehensive curriculum development project dealing with the 
training of vocational education curriculum specialists. The purpose 
of this two-year project was 1) to design, develop, and evaluate an 
advanced-level training program, with necessary instructional materials 
based on identified vocational education curriculum specialist compe- 
tencies, and 2) to create an installation guide to assist instructors 
and administrators in the implementation process. 

The curriculum presented here is, above all else, designed for 
flexible installation. These materials are not meant to be used only 
in the manner of an ordinary textbook. The materials can be used 
effectively by both instructor and student in a variety of educa- 
tional environments, including independent study, team teaching, 
seminars, and work^.hops, as well as in more conventional classroom 
settings . 

Dr. James A. Dunn 
Principal Investigator and 
presently Director, 
Developmental Systems Group 
American Institutes for Research 



ERIC 



-m - 



ACKNOWLEDGEMENTS 



The Vocational Education Curriculum Specialist Project was a 
comprehensive development and evaluation effort involving the 
contribution of a large number of people: project staff, curriculum 
consultants, a national advisory panel, and a number of cooperating 
colleges and universities. This wide variety of valuable inputs 
makes it difficult to accurately credit ideas, techniques, sugges- 
tions, and contributions to their originators. 

The members of the National Advisory Panel, listed below, were 
most helpful in their advice, suggestions, and criticisms. 



Myron Blee 
James L. Blue 
Ralph C. Bohn 
Ken Edwards 
Mary Ellis 
George McCabe 

Curtis Henson 

Ben Hirst 

Joseph Jul ianel 1 e 

Lee Knack 

Bette LaChapelle 

Jerome Moss, Jr. 

Frank Pratzner 

Rita Richey 

Bryl R. Shoemaker 

Wi 1 1 iam Stevenson 



Florida State Department of Education 
ECU Director^ Olympia^ Washington 
San Jose State University 

International Brotherhood of Electrical Workers 
President^ American '/ucational Association 
Program Director^ Consortium of California State 

University and Colleges 
Atlanta Independent School District^, Georgia 
Director., Consortium of the States ^ Atlanta^ Georg 
U, S. Department of Labor 

Industrial Relations Director, Morrison-Knudsey., I 

Wayne State University 

University of Minnesota 

CVEj Ohio State University 

Wayne State University 

Ohio State Department of Education 

Oklahoma State Departm.ent of Education 



■la 
nc. 



The project would not have been possible without the cooperation 
and commitment of the field test institutions listed below. 



Cal ifornia 
Cal ifornia 
Consorti um 
Cal 
Cal 
Cal 
Cal 
Cal 
Iowa State 
Uni versi ty 
University 



State University, Long Beach 
Polytechnic State University, San Luis Obispo 
of California State University and Colleges 
University, Sacramento 
University, 
University, 
University, 
University, 



ifornia 
ifornia 
ifornia 
ifornia 
ifornia 
University 
of Cal ifornia 



State 
State 
State 
State 
State 



San 
San 
San 
Los 



Diego 
Francisco 
Jose 
Angel es 



Los Angeles 



of Northern Colorado 



Overall responsibility for the direction and quality of the pro- 
ject rested with James A. Dunn, Principal Inves cigator. Project 
management, supervision, and coordination were under the direction 
of John E. Bowers, Project Director. 



EKLC 



•IV- 



TABLE OF CONTENTS 



Page 

PREFACE iii 

ACKNOWLEDGEMENTS 

PART I. ORGANIZATION AND ADMINISTRATION 

Guidelines ^ 

Overview and Rationale 2 

Goals and Objectives - 5- 

Recommended Materials ^ 

Suggested References ^* 

PART II. CONTENT AND STUDY ACTI ITIES 

Goal 9.1 

Criterion-Referenced Testing: 

Basic Definitions • 

Measurement vs. Evaluation: 

Basic Definitions 

Cri terion-Referenced Measurement: 

An Historical Background 

Characteristics of Criterion-Referenced 

Tests and Norm-Referenced Tests 

Standardized Tests 

Study Activities 

Goal 9.2 2^ 

Measuring Instruments for the 

Cognitive Domain 

Measuring Instruments for the 

Affective Domain 

Measuring Instruments for the 

Psychomotor Domain 

Types of Written Test Items 

39 



The Performance Test 
Study Activities . . 



42 



B 



ERIC 



Page 

Goal 9.3 ^5 

Test Construction 55 

Implementing Criterion-Referenced Measurement 55 

Wrapup of Module 56 

Study Activities 57 

PART III. GROUP AND CLASSROOM ACTIVITIES 61 

Classroom Activities 61 

Discussion Questions . . ." 65 

PART IV. STUDENT SELF-CHECK 67 

Part A: Knowledge Assessment 67 

Part B: Performance Assessment 69 

PART V. APPENDICES 71 

Appendix A: Possible Study Activity Responses 71 

Appendix B: Possible Self-Check Responses 81 



9 



EKLC 



-vi- 



Part I: 



Organization and Administration 



10 



PART I 

ORGANIZATION AND ADMINISTRATION 



Guideiines 

This study guide has five major sections. Each section contains useful 
information, suggestions, and/or activities that assist in the achievement 
of the competencies of a Vocational Education Curriculum Specialist. Each 
major section is briefly described below. 

PART I: ORGANIZATION AND ADMINISTRATION 

PART I contains an Overview and Rationale, Educational Goals and Performance 
Objectives, Recommended Learning Materials, and Suggested Reference 
Materials. This section will help the user answer the following questions: 

0 How is the nodule organized? 

• What is the educational purpose of the module? 

• What specifically should the user learn from this module? 

• What are the specific competencies emphasized in thi^ module? 

• What learning materials are necessary? 

• What related reference materials would be helpful? 

PART II: CONTENT AND STUDY ACTIVITIES 

Part II contains the content outline arranged by goals. The outline is a 
synthesis of information from many sources related to the major topics 
(goals and objectives) of the module. Study activities for each goal and 
its corresponding objectives follcw each section of the content outline, 
allowing students to complete the exercises related to Goal 1 before going 
on to Goal 2. 

PART HIj GROUP AND CLASSROOM ACTIVITIES 

The "Activities-Resources" column in the content outline contains refer- 
ences to classroom or group activities and discussion questions related to 
specific content in the outline. These activities and discussion questions 



are located in PART III anc for optional 'ise of either the instructor 
or the student. Both the classroom activitie: and discussion questions are 
accompanied by sugges^^. responses for use as nelpful examples only--they 
do not represent conclusive answers to the problems and issues addr.ssed. 
Also contained in the "Activlties-Resourcas" column are the reference 
numbers of the resources used to develop the content outline. These 
reference numbers correspond to the numbers or the Suggested Reference 
Materials in PART T. 

PART IV: STUDENT SELF-CHECK 

PART IV contains questions directly related to the goals and objectives of 
the module. The self-check may be used as a pre-test or is a post-test, 
or as a periodic self-check for students in determining their own progress 
throughout the module. 

PART V: APPENDICES 

Appendix A contains responses to the Study Activities from PART II, and 
Appendix B contains responses to the Student Self-Check. The responses 
provide immediate feedback to the user and allow the module to be used 
more effectively for individualized study. They have been Included in the 
last part of the module as appendices to facilitate thf^ir removal should 
the user wish to use them at a later time rather than cor.curreiit ly with 
the rest of the module. 

Approximately 30 hours of out-of-class study will bo necessary to complete 
this module. 

Overview and Rationale 

The purpose of this module is to provide the future curriculum specialist 
with the knowledge and skills to develop test instruments that measure student 
achievement. Many texts enumerate the purposes of a testing program. Among 
these are the improvement of training or instruction, the rjotivation of 
students, determination of grades, and use as a basis for selection and 



guidance, for purposes of this modules howover^ "testing" refers 
specifically to the assessment of student accomplishment of the 
instructional objectives of i course (or instructional unit) as 
specified in the criteria ■ *:he term "criterion-referenced 
testing. " 

The traditional and primary fonn of testing in vocational education 
has been "norm-referenced testing." And although nonm-referenced 
testing continues to be a viable form of testing when used for appro- 
priate purposes, it is th( riterion-referenced test that is actually 
the most appropriate measure of whether or not an instructional objec- 
tive has been achieved. It is with this 1<itter form of testing that 
this -le is concerned. 

The module begins by examining the concept of cri tor ion-referenced 
testing within the framework of educational evaluation. Although 
criterion-referenced tr?'>ting in the strictest sense Is "measurement" 
and not "evaluation," in a broader sense it cannot he Isolated from 
educational evaluation, wtiich relies on num(?rous measurements to d(»tcr- 
mine ttie merit of various (?dtjC(itional phenomena. 

llo J the morlule » /Mti'inf?s techniques iind ,i(jpn)aches ^pproprliit** lor 
assessing students' achi(?vemf»nt of Instructional ol).ie(;Ll ves In thr* 
thM^: domains of learning: cognitlv*s aff(?ctivrs and psyctiomoLor. 
Instructional objectives are iiiiienahlo l.o a wide varW'ty of assessment 
technl(|Uf?s , not just ttif* dll -too-oflen used |Mp(?r -and-[)enc II test. Itw 
important point is ttuit ttif? technlf|ur» selfM.t**d match tfie n*gu IriMiients of 
the oljjectlve. iSiper-arid pcnc 1 1 tests, inr fxampW*, arr; not .i| propr la tr* 
for determlnlnrj wtiftti(*r or not a sturlf»nt Is iibji* to opi*r*it(* woorl 
latfuj profjerly. 

Ihe morlulo ttien pr'*sents tti** r:urr Iruluiii sp<*(;l.illst witti .1 tr*ctin1gu(* 
(or d(«veloplng .1 plan I rom wtiieti to eonstruet cr'lter'iori-rf*li*r<*n(.i*d t«*st 
Instruments, .irid lln-illy provides ttii* spiMM.illst wit.ti an '>j)p()r tun 1 1 , !, 
.letual ly ( tne.tr u'.t tlu'se ins I r uiiienl.s. 



This module completes the series of three modules on the development of 
instruction for vocational education. Module 7, Derivation and Speci- 
^J-Pl^'^^Il of ^QS^rjJ^J^JPI'Al Pi^Ji'i^JLLves., discussed procedures both for 
identifying possible objectives for instruction and for writing such 
objectives. Module 8, Development of Instructional Materials , de- 
scribed the process of developing instruction to accomplish specific 
objectives. Now this module, Module 9, completes the picture by pre- 
senting means of assessing student achievement of the objectives of 
instruction. 

A variecy of approaches to instfuctional development are in practice 
in vocational education today. These approaches include: the inte- 
grated approach; the occupational or job analysis approacti; the 
f:lii^.t(!rs, families, or common elements of occupations approach; the 
furutions of industry approach; and the concept approach. (Each of 
thr-.r approaches is briefly described in Introductory Module 2: Roles, 
of Vocational Elucatojj, j_n Currijcujj^ This series of 

modulfis on instructional development for vocational education follows 
<jn occupational or job analysis approach because it is the most connnon 
and is often usfd In combination with other curriculum techniques. 
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Goals and Objectives 

Upon completion of this module, the student will be dble to achieve the 
following goals and objectives: 



GOAL 9.1: UNDERSTAND THE CONCEPT OF CRITERION-REFERENCED MEASUREMENT 
WITHIN THE FRAMEWORK OF EDUCATIONAL EVALUATION. 

O bjective 9.11 Define the following terms: educational 

evaluation, educational measurement, criterion- 
referenced testing, and norm-referenced 
testing. 

Objective 9.12 Identify the historical conditions that gave 
impetus to the use of criterion-referenced 
measurement. 

Objective 9.13 Given a specific characteristic, determine 

whether that characteristic describes a norm- 
referenced test or a criterion-referenced 
test. 

Objective 9.14 Distinguish between norm-referenced measurement 
and criterion-referenced measurement on the 
basis of; variability, item construction, 
reliability, validity, item analysis, and 
reporting and interpretation. 



GOAL 9.2: SELECT APPROACIILS/TECHNIQUrS FOR ASSESSING STUDENT ACHIEVEMENT 
OF INSTRUCTIONAL OBJECTIVES IN THE THREE DOMAINS OF LEARNING. 

Objective 9.21 Recognize appropriate techniques for assessing 
student achievement of instructional objectives 
in the cognitive, affective, and psychomotor 
domains . 
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Objective 9,22 Identify the two basic types of written 

test questions and describe the advantages 
and limitations of each. 

Objectiv e 9.23 Define the term "performance test." 

Objective 9.24 Select approaches/techniques for assessing 
student achievement of instructional objec- 
tives of a civen unit of instruction. 

GOAL 9.3: DEVELOP AN EVALUATION PLAN AND CONSTRUCT TEST INSTRUMENTS 

FOR MEASURING STUDENT ACHIEVEMENT OP INSTRUCTIONAL OBJECTIVES. 

Objective 9.31 Given an instructional objective stated in 

behavioral terms and a list of possible test 
items, identify those test items that would 
be appvopriate for assessing the objective. 

Objective 9.32 Deveiop an evaluation plan for assessing 
student achievement ot the instructional 
objecti"eo for a given unit of instruction. 

Objective 9.3 3 Construct test instruments for assessing 
student achievement of the instructional 
objectives for a 'given unit of instruction. 
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Goal 9.1: Understand the Concept of 
Criterion-Referenced Measurement Within 
the Framework of Educational Evaluation. 




Cri terion-Referenced Testin g: Bas ic Definitions 

1. Various authors have defined the term 
"criterion-referenced testing." Let's examine 
some of those definitions now. 

2. According to Robert Glaser, "A criterion- 
referenced test is one that is deliberately 
constructed to yield measurements that are 
directly interpretabl e in terms of specified 
performance standards" (11). 

3. According to Mager and Beach, a criterion test 
"determines how well the student's performance 
at the end of instruction coincides with-the 
performance called for in the objectives" (15). 

4. According to Kibler, Cegala, Barker, and Miles, 
a criterion-referenced test is "designed to 
determine whether a student has achieved 
mastery of a behavior as specified in an 
instructional objective" (14). 
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(11) "A Criterion- 
Referenced Test," 
p. 41. 



(15) Developing 
Vocational 
Instruction , 



p. 40. 



(14) Objectives for 
Instruction 
and Evaluation , 
p. 116. 



Content Outline (continued) 



5. According to Butler, the criterion test 
measures the individual's proficiency against 
a predetermined set of absolute criteria. Its 
main purpose is to determine as accurately as 
possible when a student has reached the 
acceptable level of performance (5). 

6. According to Goldstein, "Criterion-referenced 
measures provide a standard of achievement for 
the individual as compared with specific be- 
havioral objectives and therefore provide an 
indicant of the degree of competence attained 
by the trainee" (13). 

B. Mea surement vs_. Evaluation : Basi_c Definitions* 

1. It is necessary to distinguish between 
measurement and evaluation in order to put 
in perspective the concept of "criterion- 
referenced testing." 

2. A criterion-referenced test is a "measuring 
instrument." "Measurement" refers to the 
activity of gathering and quantifying infor- 
mation through the use of a measuring instru- 
ment. No inferences, interpretations, judg- 
ments, or decisions are made about the infor- 
mation. The measuring instrument can be any- 

> thing that collects raw data: teacher obser- 
vation, a true-false test, a rating scale, an 
attitude scale, a personality inventory, an 
IQ test, or an anecdotal record kept by the 
teacher (7). 



( 5 ) Instructional 
Systems " Develop- 
ment for Voca- 
tional and Tech- 
nical Training , 
p. 98. 



(13) Training : Pro- 
gram Development 
and Evaluati on , 
p. 63. 



See Discussion 
Question A in 
Part III. 



(7) kyil5. Instruc- 
ti onal Ob jectiv es 
in T eaching , 
p. 68. 
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Content Outline (continued) 



3. Measurement is one activity in the more 
general process of evaluation. "Evaluation 
not only includes measurement but also the 
making of judgments and decisions based upon 
the gathered information. It is in evaluation, 
not measurement, that experience, judgment, 
and intuition enter the picture" (7). 

4. According to Clark, "The majority of problems 
encountered by teachers as they evaluate can 
be directly linked to the quality and extent 
of their measurement. Inadequate evaluations 
are usually based upon faulty measurement or, 
in more extreme cases, upon little or no 
measurement. Though measurement is only one 
phase of the evaluation process, it is the 
basis from which the other phases stem" (7). 

5. A criterion-referenced test, then, is a 
measuring instrument that measures learning 
outcomes in connection with instructional 
objectives .* 

CjiJ^jion-:_Rjf_erenced Measurement: An Historical 
Back^ro^und 

1. The concept of criterion-referenced measure- 
ment is not entirely new to educators. In 
1918 Thorndike made reference to the distinc- 
tion between the two types of measurement: 
nonii-referGncod and criterion-referenced. 
Thorndike, however, did not use these specific 
terms (27), 



2. J 



( 7 ) Usi nq Instruc - 
tional Objecti ves 
in Teaching , p. 69. 
See also: (8) 
Home Economics 
Evaluation , Chap. 
1. 



* See CI assroom 
Activity 1 in 
Part III. 



(27) "The Nature, 
Purposes, and 
General Methods 
of Measurements 
of Educational 
Products." 
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2. In 1963 Robert Glaser provided the initial 

conceptual clarity and indicated the practical 
implications of the two measurement procedures; 
his writing has stimulated numerous articles 
and papers elaborating on the applications, 
advantages, and liabilities of the two 
approaches (12).* 

Ch aracteristics of Criterion-Referenced Tests and 
Norm-Referenced Tests 

1. Various statements have been made in the 
literature comparing norm-referenced and 
criterion-referenced measurement. What 
follows is a summary of those statements. 

2. Characteristics of Criterion-Referenced 
Measurement 

a. According to Smythe, Kibler, and Hutchings 
"The main function of criterion-referenced 
measurement is to assess whether the 
student has mastered a specific criterion 
or performance standard. 

b. Complete instructional objectives are 
specified in the construction of criterion- 
referenced tests. 

c. The criterion for mastery must be stated 
(i.e., instructional objectives) for use 
in criterion-referenced measurement. 

d. Test items for criterion-referenced tests 
are constructed to measure a predetermined 
level of proficiency. 

G. Variability is irrelevant; it is not a 
necessary condition for a satisfactory 
criterion-referenced test. 

-14- 



(12) "Instructional 
Technology and 
the Measurement 
of Learning Out- 
comes : Some 
Questions.'' 

* See Discussion 
Question B in 
Part III. 
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f. The test results from criterion-referenced 
measurement suggest the use of a binary 
syst-ii (i.e., satisfactory/unsatisfactory; 
pass/fail). However, criterion-referenced 
test results can transposed into the 
traditional grading .ystem by following a 
set of specif ally constructed rules" 
(24). 

Characteristics of Norm-Re rerenced Measurement ^ 

a. According to Smythe, Kibler, and Hutchings, 
"The main function of norm-referenced 
measurement is to ascertain the student's 
relative position within a normative group. 

b. Either general conceptual outcomes, 
(usually done) or precise objectives may 
be specified when constructing norm- 
referenced tests. 

c. The criterion for mastery is not usually 
specified when using norm-referenced tests. 

d. Test items for norm-referenced measure- 
ment are constructed to discriminate among 
students. 

e. Variability of scores is desirable as an 
aid to meaningful interpretation. 

f. The test results from norm-referenced 
tests are amenable to transposition to the 
traditional grading system (A; B, C, D, 
F)" (24). 

Desirable Characteristics of a Criterion- 
Referenced Measuring Instrument 
a. A good criterion-referenced test will be 
valid, rel iabl e, o bjec tive . co mprehen sive, 
and economical . (AUhou'ili I'Structors 



(24) 



"A Comparison of 
Norm-Referenced 
and Criterion- 
Referenced Measure- 
ment with Impli- 
cations for Com- 
munication Instruc- 
tion." 



* See Classroom 
Activity 2 in 



Part III 



(24) "A Comparison of 
Horm-Referenced 
and Criterion- 
Referenced Measure 
ment with Impli- 
cations for Com- 
munication Instruc 
tion." 
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and curriculum specialists are rr.ost 
familiar with these terms in regard to 
norm-referenced tests, such characteristics 
are desirable for criterion-referenced 
tests as well. However, these terms take 
on somewhat different and special meanings 
when used in regard to criterion- 
referenced tests . ) 
b. Va1 idity . If criterion-referenced 
measuring instrument requires the same 
behaviors that are identified in the 
objectives, then the scores are said to 
be valid . (Less precisely, the instrument 
is said to be val id . ) According to Clark, 
"An objective asks the student to demon- 
strate some behavior relative to some con- 
tent; a measuring instrument also asks the 
student to demonstrate some behavior 
relative to some content. The degree to 
which the two behaviors and the two topics 
correspond will be the degree to which the 
instrument is valid. This type of 
validity is called 'content' validity (7). 
Clark concludes that of all the desirable 
characteristics of a measuring instrument, 
content va1 idity is the most important. 
If a measuring instrument generally fails 
to measure what it was designed to measure, 
all other characteristics lose their 
meaningfulness. To assure content 
validity in ' ir measuring instruments, 
course developers and instructors should 



( 7 ) Using Instruc- 
tional Objectives 
in. Teaching , 
p. 69. 
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make certain their objectives are clearly 
defined, for the objectives provide the 
standards for making judgments about the 
validity of the measuring instruments. 
R eliabi lity. A criterion- referenced test 
that is reliable will be consistent in its 
measurement. It will measure in exactly 
the same way every time it is used. For 
example, if a test is giver tc 3 specific 
group of students one day and tiien given 
again to the same group on another day, 
the test scores should be relatively the 
same for the individuals on both days. 

If the objectives call for behavior that 
is observable and measurable and the test 
items call f^' ti.i^ same behavior, the 
test v."^ ^ probably n?ve a high degree of 
reliabi 1 i t> 

Objectivity . A good criterion-referenced 
test must be relatively objective, that is, 
the judgment of the scorer should enter 
the scoring process as little as possible. 
The scores on a good test will be about 
the same regardless of the individual 
doing the scoring. All else being equal, 
an objectively scored test (which does 
not permit scorer bias to affect the score) 
is more valid and reliable than a subjec- 
tively scor- ne (5). 
Comprehensi .t^ss. With a test item for 
every objective, the criterion-referenced 
test will necessarily give comprehensive 



(5) Instructional 
Systems Develop - 
ment for Voca- 
tional and Tech- 
nical Training , 
p. 100. 
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coveraa-^ of all desired behaviors. 
Economy . The criterion test must be 
economical regarding time, manpower, and 
facilities; but economy is strictly rela- 
tive when applied to an instructional 
system. The gain in reliability and 
validity through the use of a truly com- 
prehensive, criterion-referenced perfor- 
mance test far outweighs the economy of 
group paper-and-pencil tests (5).* 



E. Standardized Tests 



2. 
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The majority of schools in the United States 
today make use of standardized tests of one 
kind or another. Most tests of intelligence, 
aptitude, personality, and interests are 
standardized tests, made by specialists for a 
test publisher, and sold by the publisher 
throughout the country. Few schools or school 
systems, except in very large city organiza- 
tions, attempt to develop such tests for their 
own use. 

The situation with respect to achievement 
tests is somewhat different. There are, of 
course, many standardi zed achi evement tests on 
the market, and literally millions of them are 
used every year. These include tests in the 
eparate subjects or branches in addition to 
the achievement batteries. However, teachers 
usually feel that these tests do not adequately 
measure their own or the local objectives of 
instruction. Thus, while standardized tests 
are very useful in some ways, they are not 
usually the principal method of measuring 
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(5) Instructional 
Systems Develop- 
ment for Voca- 
tional and Tech- 
nical Training , 
p. 100. 

* See Discussion 
Question C in 
Part III. 
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achievement. In general, it is the classroom 
teacher or curriculum specialist who is 
relied upon to formulate achievement tests. 
It is important, therefore, that the teacher's 
and specialist's professional training include 
some instruction on effective ways of planning, 
constructing, and evaluating various measuring 
instruments. 

Clearly, no standardized test of achievement 
can serve the needs and purposes of every 
local situation. The requirements for a 
standardized test are such that the test must 
be largely confined to instructional elements 
common to a large number of schools. Such a 
test cannot, therefore--if it is to be maximal* 
ly useful--include all those elements that are 
peculiar to any one or even to a limited num- 
ber of schools. The most desirable and prob- 
ably the most common practice is to use both 
standardized and teacher-made measuring instru- 
ments in most situations (28). 
Today, criterion-referenced achievement tests 
are being "standardized," that is, developed 
by specialists for a test publisher and sold 
throughout the country to a large market. 
Such tests will have the same problems as 
other standardized achievement test They 
won't adequately test the unique objectives 
of the local situation. Teachers and curricu- 
lum specialists will continue to rely largely 
on the development of their own tests. When 

29 
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(28) Foundations in 
Vocational Educa- 
tion : Reference 
and Work Book for 
Trade and Tech- 
nical Teacher 
Education. 
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the occasion does arise to select a standar- 
dized criterion-referenced test, however, the 
teacher or curriculum specialist must be 
cautious that the behavioral criteria of test 
items are spelled out clearly, that in fact 
the test items measure what they say they are 
intended to measure.* 



30 



* See Classroom 
Activity 3 in 
Part III. 



-20- 



study Activities 

Based on your reading of the content outline and any additional references 
as suggested^ complete the following activities. 

Basic Definitions (19) 

Educational evaluation ; Educational evaluation refers to the determina- 
tion of the worth of educational phenomena; it generally refers to the 
evaluation of an educational enterprise, such as an instructional se- 
quence, not to the evaluation of students within t^-at enterprise. Edu- 
cational evaluation is a process of worth determination. 

Educational measurement : Educational measurement refers to the assess- 
ment of the current status of an educational phenomenon in a precise 
fashion--that is, counting or enumerating so that the phenomenon can be 
more accurately described— without placing value (goodness or badness) 
on the phenomenon thus described. Educational measurement is a process 
of status determination. 

Criterion-referenced testing : Criterion-referenced testing is a form of 
educational measurement that ascertains an individual's status with re- 
spect to some criterion or performance standard. Because the individual 
is compared with some established criterion, rather than with other in- 
dividuals, these measures are described as criterion-referenced. 

Norm- referenced testing : Norm-referenced testing is a form of education- 
al measurement that ascertains an individual's performance in relation- 
ship to the performance of other individuals on the same measuring device. 
Because the individual is compared with some normative group, such mea- 
sures are described as norm-referenced. 
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Determine whether the following examples represent educational evalua- 
tion or educational measurement by marking an "X" at the appropriate 
choice. 

u. A vocational counselor directs a testing program that provides IQ 
scores and comprehensive achievement scores for each student in the 
district. 

^a. evaluation 

^b. measurp ant 

b. A vice-principal in an area vocational school observes a home econo- 
mics class for a week and concludes that the instructor lectures too 
much, providing little opportunity for students to participate. 
^a. evaluation 

^b. measurement 

c. An industrial arts instructor administers an examination to deter- 
mine if the students in his class have achieved the instructional 
objectives for a unit of instruction. 

a. evaluation 

b. measurement 

d. The principal of a comprehensive high school annually determines the 
comparative percentile ranks of all entering freshmen in English and 
mathematics. 

^a. evaluation 

b. measurement 
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Determine whether the following examples represent criterion-referenced 
testing or norm- referenced testing by marking an "X" at the appropriate 
choice. 

a. In the Red Cross Senior Lifesaving Test, an ind-^vidual must demon- 
strate certain swimming skills to pass the examination, regardless 
of how well others perform on the test. 

^a. criterion-referenced tescing 

^b. norm-referenced testing 



b. Although a business student scored 90% on an examination, he did not 
receive an A because a majority of the students in the class scored 
higher. 

^a. criterion-referenced testing 

^b. norm- referenced testing 

c. A test is used to determine the top 25 vocational students for a new 
vocational leadership program. 

^a. criterion-referenced testing 

^b. norm- referenced testing 

d. Students in a woodworking shop are required to pass a knowledge test 
of basic safety rules before operating any equipment in the shop. 
^a. criterion-referenced testing 

^b. norm-referenced testing 
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Read the "Foreword" and Robert Glaser's article "Instructional 
Technology and the Measurement of Learning Outcomes: Some Questions" 
in Popham, Criterion-Referenced Measurement : An Introduction . Then 
complete the following questions. 

a. How did World War I psychology promote the use of norm-referenced 
measurement in education? 

b. Who coined the term "criterion-referenced measurement" and when? 

c. What factors in education do you think contributed to an increasing 
emphasis on criterion-referenced measurement? 

d. What form of mecsurement is primarily used in education today? 
How would you explain this phenomenon? 

e. What is Glaser's primary concern regarding the measurement of 
learning outcomes? 

Read the Special Report of the Association of California School Adminis- 
trators on "The Nature and Uses of Criterion-Referenced and Norm- 
Referenced Achievement Tests," provided on the following pages. Then 
complete the questions below. 

a. In what sense is the concept of criterion-referenced measurement 
"new"? 

b. What are other current terms in use for criterion-referenced tests? 

c. Under what circumstances might a criterion-referenced test be 
considered "standardized"? 



34 

-24- 



Vol 4, No. 3 



KcNcarch & I'valuatiDii CtMiiiuiltcc 

Robcit liockLT, ChairpLMMMi 
SubLtMUiuiiicc: 

DniKilii RosNdrcciK ('TH/ML(;ia\v-!lilI 

Ndinuiii (iiiiNburg, Ocean View SD 

HaioUl llyinaii, Coinpion I'liitieiJ SI) 

THE NATURE AND USES OF CRITERION-REFERENCED AND NORM-REFERENCED ACHIEVEMENT TESTS 

Inlioduciioii 

During liic past few y.'ars. liic icrni 'Vriicrinn-rcfcrciicoLl loi" lias come inio incioasiiig wide use \o ilie point wiiere it is viriu- 
ally a bu// word. AhnoNt any di:.eusMoM among school admmiMiaiors of -wbai^ new" will include allusions lo criierion-reter- 
en'ccd iCNlN and perhaps domain-referenced lests. maslery lesls, or ohjcciives-based lesls as well. These sorls of tests are usu- 
:illv coniraMcd wiili tiornMcferenced te>is, standardized tests. Kaditional tests, or the like. It is particularly common to tind 
ihai sucli discussions are confuMug becausL^ people have different idejs about what these words and phrases mean and about 
wiiai tile various tests arc good tor. 

Most school otfh ialsl.LMr ciilicism of their testing primrams from all sides and consequently many are finding the various 
clauus about ciiteiioiMeferenced tests both enticing and disturbing. Are they the answer to the teachers* prayers'.^ The prin- 
cipals'".' What are they really > Just another fad? \Vh\ all the diffeient teinis" Can we now discard the iiadiiiimal achievement 
tests'.' Are there handy b.>oks oi ai tides jiiswering these questions for school administiators? The answer lo the last of these 
questions is "no" and hence this paper. 

Thus, the general purpose of this .eport is to enable generiil adtuinistrators to discuss these matters moie knowledpeably with 
then slaff and communities, llie specihc pmposesaie to ( !> cljiity thi> piohteraiing terminulogy {and peiiiaps stunt its 
giowih) bv enumeiatmg common diMei ences and distinctions in meaning between teinis and between usages of these terms, 
(2) explain thecsscntul natuie of the dilfeience between the two I mds nf lests. and ( ,M oftei a viewpoint on their uses in 
schooU. The discussion is limited lo achie\einent tests, and einpluisi/es what the teims imply about differences in how tests 
aie constiucfed and how they may be used, li is hoped liiai uMdeis of this lepoit will be better able to infei what is meant 
when tliey encounter discussions of these ideas jiid will W belter able to judge and use the \ ai u>us kinds I'f measures available. 
It mi\ even be lioped that the rcMilt will be j generjl lediiction in cimtusioii. 

Tiie ..mftision aiiscs because there is a di\eisii> of us.rj.' .niorig the ".x i-eits." In ihis omiIcM. an expert is jnyone wiio ha> 
u III ten Mnnethmg about the two coiitiaMing krnds ot i..'>ts in .i le.ogm/cJ 'omnal oi in J book. The ci>niKists and dixtiiK- 
lionMU usages ami meanings cited in ihisrepoi! jre hugely .omposiies .md vjnnot be ai 1 1 ilnil ed lo ari> one M)rir.e.^- 
^.\ iisi ot the majoi lefeiences consulted is .ipperuled. 1 uilherrnoie. ue nmoic to a number of scholars and oigaiu/jtioiis ask- 
nigtor their dehintioiis. I hose lepK ing ucie I'rotcssors J. Stanle> Ahmaim. \Al.P: Robert l:beK Michigan Stale L nivei- 
sitV; William Mer/. Sacramento Stale ( oUege: Jason MilliiuuK C oiiieil lmue:sit> . M. 1. Chas. K. Woodson. University ot Cab 
ifomia. Beikelev. and the Ceiuer for the Study of !- valualion. IXT.A: CI Mc(.iaw.|iill: and Houghton Miftlm. We aie 
graieful t*or these !ephes:all of them were thoughttul and Useful. We might add. tliey make us feel Ciuifident thai the pro- 
bl'.Mii just outhned does indeed e.xisi. 



SPECIAL REPORT is published by the ASSOCIATION OF CALIFORNIA SCHOOL ADMINISTRATORS 



Noitlier nornvreterenced nor ciilciion-rcrcrenccd losts arc particularly ik>\\ . The (^lincsc used a species of norni-ieferenced 
tests tor hundreds of years in theii civil service piograni. and teacliers in numy places have used vaiious knids ot ci ilerion-relei - 
enced tests in their classes. Both kinds of tests liave undergone changes and technical development in the last centuiy. It is in 
the technical sense that criterioii-relerenced tests aie rather new . Only in the last few years have those conccriK«d with theoret- 
ical and technical issues of educational and psychological nieasuieineni . i.e.. psychoiiietricians, undertaken any sustained large 
scale effort to deal with criterion-ieferenced tests. Ii is even more recent that publishers of tests for scliools have tried U) otter 
any substantial tests of this sort. Tlierefme. consensus among the experts about technical ic(|uirements the tests has yet in 
be reached and little about these matters can be I'ound in courses and textbooks. 

To explore these issues, we first offer crude definitions of criterion- and nornvreterenced tests. Then we try to dispose tit some 
**red'herring ' problems created by the words themselves and consider the alternative tetinimiiogy that has been suggested. 
Then the basic difterences between the two kinds of tesis are pointed out and related [o some imptirtant technical issues in 
construction and test interpretation. The final section of the paper pertains to appropriate and etTcctive uses '^T tliese (wti 
species of achievement tests. 

Det'imtions 



No rfn-Retei enced Measures 

Most writers agree that norm-referenced achievement tests use a sample of questions that refer to a broadly detlned set ot ed- 
ucational goals. Scores are meant to tell how much the student knows about that area or what level of ability he or she has 
attained. Because of the lack of a fully defined (enumerated ) body of knowledge ( try to list all knowledge about American 
history) or of a natural bottom and top level of ability (what is lOO'-; computational skill?), score meaning is most readily ob- 
tained by comparing scores of students. If a well-defined group of students (the normative population ) is sampled properly, 
arw score can be compared with those of the rest of the population. The data derived t>om the scores tif the sample, which 
permit these comparisons, are the norms. They may appear in many forms such as standard scores, percentiles, or grade 
equivalents. In any case, the norms are a consequence of the basic nature of the test: they do not determine that nature. 

Criterion-Referenced Measures 

Acriterion^referenced achievement tesi provides a set of questions that refer to relatively restricted (i.e.. specific) education^ 
al objectives. Pertormance on the items is meant to tell how much the student knows about the topic or how well he or she 
can perform the task. The specificity of the objective and the clarity with which it describes the behavior representing the 
objective to be achieved provides direct meaning to the scores. For example, the .student knows (or doe.s not know) how to 
identify the topic sentence of a paragraph. Criterion-referenced tests usually measure a large number of objectives, each 
treated separately. Since objectives are not equally important and some are contained in others, simply adding the .scores to- 
gether will not yield a score that means what the score on a regular achievement test does. 

Problem Words . 

Norms 

Creation of norms for a criterion-referenced test is entirely possible. Just because the test is built to be directly inlerpieted 
relative to some performance standard does not preclude its also having norms and being used as a norm-referenced test. 
Hspecially if it is to be used for program evaluation this may be a reasonable piocedure. However, since the test has proba- 
blv been'designed to be used for instructional planning and guidance, it is less likely to be as usetul for other purposes, in 
any case, the existence of norms d(K>s not make the test a norm-referenced test. Fortunately, this source of confusion is 
rare. 

(Viterion 

Analogouslv. one ma> define som. score on a lest c(Mistructed to be norm-referenced as the criterion score indicating mas- 
tery. Since the breadth of the scale usually makes this decision arbitrary and not something others would autimuitically 
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Lindoisiand, it iiioioly creates contusion lo say thai tlus makes a test cnteMOM-ret'ereiiced. This sor( of" contusion is widespread 
and is complicated b>' the diri'icully entailed in specitym^ the tneadth ot' an objective or its d^.>main. The comments ot" those 
wlio take this position indicate that they really maintain that all tests are ;ilike and iillinialet>' must be norm-reterenced. This 
fiosition is contradicted by the common sense mterpietaiion used by \\vm\\ teachers with their own classioom tests. 

Standardized 

Tlie-e is little unanimity ot' opinion about the meaning ul tlie word standardized as a descriptor ot' tests. I-or some, it merely 
means tests with norms, Tor others, it means the test is ( 1 ) publislied. (2) normed. (3) has explicit instructions lor administra- 
tion, and (4) was constructed lo meet technical standards, Still others leave out requirement 1 or requirement 2 or both. By 
this last detinilion. many criterion-retcrcnced tests are standardized. Plainly, litis term is not usctul unless one specifies the de- 
finition being used whenever the word is employed. The most common usetiil definition (;*nd the one that we use) addresses 
itself 10 test administration and scoring. Strict adherence lo the author's or publisher's administration and scoring directions is 
necessary if reliance is lo be placed on the results, failure lo adhere lo them means that norms cannot he used, comparisons 
with previous testings or other groups cannot be made, and so on. 

Alternatives to the Term ''Criterion-Refereiii ed** 

Let us return lo tlic definitions of norm-referenced and criterion-referenced tests. It is apparent that the ideas are clear enough 
but the terminology is unfortunate. This is why all the other labels for criterion-referenced achievement tests exist. Yet it 
seems clear to us that matters have gone too far to change; the terms "criterion-referenced" and ''nurm-referenced*' are loo 
well established lo be abandoned- The writers who use other words have excellent reasons for doing so, but usage is against 
them and they do not agree with each other. They do agree there are two general kinds of achievement tests although some of 
those responding to our questions preferred to describe these differences as a matter of degree and emphasis rather than kind. 
Nevertheless, it seems lo us that a general consensus exists about the underlying ideas but that differences in terminology tend 
to obscure these agreements. 

Those who prefer the term ''domain-referenced" are concerned about test construction procedures (see the next section) while 
those who use ''mastery*' emphasize a use. Tlie term "objectives-based*' elicits wide agreement but little enthusiasm. General- 
ly speaking, those who use terms other than crilerion'reterenced appear lo understand perfectly well what is meant, they sim- 
ply do not like the terminology. Therefore, if one accepts the proposition that usage has already established norm-referenced 
and criterion-referenced, the problem is not those few who would use other labels, but the many who have erroneous notions 
about what these two terms mean. Let us. therefore, try to elaborate on the nature of these two kinds of tests. 

Characteristic DitTerenees Between Criterion- and Norm-Referenced Tests 

The two kinds of tests may both have norms, may both have a particular criterion score designated as indicative of mastery, 
and may both be considered standardized. The two kinds often differ on these points but they do not have to do so. The 
basic dit'ferences arise in the construction procedures. 

First there are thecoiiient specifications for test construction from which content validity is established. The Ideal process of 
building a traditional standardized achievement test is well known. Many textbooks describe it. The first step is to establish 
content specifications: experts in tlie area outline the topics to be included and indicate their relative importance. Items are 
written to t1t the topics; importance of a topic is reflected by the number of items about the topic that are included. The test 
is usually designed to yield a tew (e.g., 4 to 7) separate sub-test scores each based on perhaps 20 to 50 items, as well as i total 
score . 

For criterion-referenced tests, the content specifications procedure is similar except that each objective or component of the 
topic to be measured is considered separately . The specifications indicate all the topics but give them no weights. A criterion- 
referenced reading test, for example, may include 40 lo objectives each of which is measured by. perhaps f'ive items. In ef'- 
teci. there are 40 to bO separate tests and scores but no total score. 

Content validity is judged in both cases by examining the atlequacy of content co\ erage. but in the traditional test the apprv.- 
piialeness of the emphases is an important criterion for judging. Content validity of both ts pes of tests is also judged b\' exam- 
mg how well ihe items f"it the categories t>r objectives they are meant to measure. 

In sht)rl. ilie only difference in content ciiteria is in tiie weighting and the breadth of content represented h> any v)',l >core. 
Mils ditterence may not appear huge, but it represents two ver>' different views of ilic measnrement task, hor norm-refetenced 
tesis. theie iN.issiimed lobe i tun oi capahilii> (e.g.. leading skill) that indivuhials ha\ e in diflcteni anuuiin.N hecai.sc nf dit'- 
t'eient amomus of" learning. I he iiie.isuiemeiit task is to place people on a scale of that tuiit; the scale is usn:ill> Jsiabhslied by 
norms. I-(u criterion-referenced tests, a behavior is described tluii occurs under certain conditions when a student has 
achieved the objective; the measurement task is to deierinine if' a student has achieved the objective. When objectives are com- 
plex and broad, one can measure degrees of achievement ami the two sorts o\ tests begin to look alike. However, as loiiiz .is it 
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IS possible to inlcrjnol sciwcsas tliroci dcscuptions ot acliievcnkMil . ific test can be called *\riteiiiMi-rclcrcncod " 

The seciMid slop ui lost conslruction is to write items to tii the cojUcmii specttieatioiis. (Jeneially speaking, the piocess docs 
not dilYe? tor the two soils ot* tests. However, some writers leel slroiiely that tliey should ditfei. Tliese aie tiie advocates o\ 
the term 'Momain-jete'euced." w lucli leteis to their item wriimg piocedures. 

1 he thud step m test constuiciion is to ii\ out the items .iiul select the hest ones to iiuike up the test. Hete the differences 
between tiie two kinds ot' tests aie more numerous. On slanJ.irdi/ed achijevement tests, the **best" items are usually tfiose 
that: ( 1 ) discnininate well. i.e.. aie answered conectK by geneialK high scoring students and incorrectly by low scoies (the 
poinl-bisenal coefticieni an item-test cim relation ^ one conmuni index of liiis); (2) sfiow growth from grade to grade 
(r.it»sl aciiievement batteries provide diftereni levels of the test toi evciy one. two. or three grades). (3) are about in the middle 
in difficulty e.g.. on a multiple-choice tCbi, about (6[ "f of an avoiago group would gel the item right (this is the average <me 
aims for but it is helpful to liave some easy items and some fiard t)iies).(4) if the item is multiple-choice, each alternative (dis- 
tractor or toil) should be chosen by many of the lower scoring students. 

In cnterion-reterenced tests, by ctmtiast. one looks for an item that will discriminate between students who have and have not 
achieved the particular objective it is measuring but there is no concern with its relation to other objectives. One also looks 
\'or Items that will shmv mastery immediately after the objective is achieved. Thus, if an objective is taught during a week, 
many if not most students should tail the item before that week*s instruction and pass jt after (assuming that the instruction is 
adequate). This characteristic is called '^sensitivity to instruction." Determination of sensitivity to instruction requires a two- 
stage liyout, one before and one alter relevant instruction; a single-stage tryout with a treatment (taught) and a control (not 
taught) group can provide smiilai information piovided that assignment to treatments is done properly. A traditional achieve- 
ment test has only one tryout and its timing relative to instruction is rarely considered. Selection of items that are sensitive to 
instruction means that a criterion-referenced test w ill retlect learning as it happens, something that regular achievement tests 
rarely do. It takes six or eight months of schooling tor most norm-referenced test scores to reflect significant changes because 
they measure many brt^ad and cimiplex gtials all at once. This is one reason why criterion-referenced tests can be more effec- 
tive than norm-referenced tests when used to guide instruction and for looking at programs internally while in progress. Norm- 
referenced tests on tlie other hand are efficient when used to evaluate program outcome because of their long-term summative- 
character. Finally, one looks for items that will be difficult for those who have not achieved the objective and easy for those 
who have. "Middle" difficulty items are not particularly desirable. This is equivalent to the preceding point but contrasts 
with norm -referenced items. 

Table I summarizes some of the similarities and differences between the two kinds of achieveipenl tests. Note that most of 
these characteristics are more properly labeled ^'typical" than necessary. These differerKes in nature and construction point 
to dift'erent uses of the two kind.s of tests. 

Table 1 

Characteristic Similarities and Differences between Norm-Referenced 
and Criterion-Referenced Achievement Tests 



NornvRefer enccd Tests Criterion-Referenced Tests 

Content Specifications 

Topics outlined and weighted according to impor- 1 . Topics broken down into specific educational ob- 

tance. number of items per topic is directly propor- jectives; number of items per objective is usually 

tional to importance. constant. In any case, all objectives are equally 

Both omission of important content and inclusion represented since each has its own score, 

of unimportant content are serious tlaws that dis- 2. Omission of important topics reduces overall value 

tort meaning of scores. ot* instrument but does not affect meaning of 

Test usually covers broadly defined educational scores. Unimportant objectives can be ignored, 

goals that represent the most widely adopted 3. Test covers a set of specific educational objectives, 

school curricula. 4. The set of objectives used may be easily selected 

Altering a test to fit a specific local curriculum is cir modified to fit local curricula, 

very difficult: it is usually easier to build such a test 
from scratch. 

Item Writing Specifications 

Items are usually written to learning objectives 1 . Items are written to learning objectives; each ob- 

which represent a sample of those relevant to the jective is systematically sampled, 

goals being measured. Each goal is systematically 3 g Items refer only to the objective to which they 

sampled but ob-.o isves are not, written 



EKLC 



Single ilcms t>ticii roniiiie knowledge ol several as- 
pects 1)1 I he conienl. 

Desirable item Chatacteiisiu-s 



I he best items are those that . 

1 . discriminate well between those who ^core high and 
those who score low on the test. 

2. show gTowlh troin grade to grade, 

3. are abt)iii inidrange in difficulty (hut some items at 
each extreme are also desirable). 



The best items are those that: 

1 . discriminate between those who have and have not 
had effective instiuction to thai objective. 

1, show mastery immediately after the objective has 
been achieved, 

}. iuive preinstruction difficulties approaching 0 (al- 
most ail get them wrong) and postinstruction diffi- 
culties approaching 1 (alnu)st all get them right). 



Administration 



Standardi7ed conditions of administration are essen- 
tial including control of time (sometimes tests are 
speeded but not always). 

Parts cannot be omitted without damage to meaning 
of total. 



Scores 



Raw scores rarely have much direct meaning. 
Measmernent places person on hypothetical scale ol 
amount of trait. 

Scale usually established by norms (comparative 
performances). 

Derived scores arc used such as standard scores 
percentile ranks, grade equivalent scores. 
Score reports usually imply value, i.e.. p,-rfoi (nance 
was good or poor. 

All items contribute to part and total scores. 



1 . More latitude in conditions is permissible. Control 
of time is rarely appropriate (unless speed is part of 
task). 

2. Parts can be omitted at will since there is no total 
score. 

1 . Raw scores have some direct meaning about 
achievement of the objective being measured. 

2. Measurement rcteis to scales based on visible per- 
formance. 

3. Scale is usually established by judgment and con- 
vention concerning adequate and inadequate per- 
formance but norms may ^xist and help. 

4. Scores used are number right, and categories such 
as mastery and nonmastery. 

5. Score reports are less well adapted to making con- 
clusions about the quality of student or program 
performance. 

6. Each objective has it5 0wn score .meaningful total 
scores are usually not possible. 



Score Distributions 



Score distributions that are approximately normal 
are desirable, e.g.: 



I. 




Score 



I . Test -re test coefficients should be higli for each 
score. 



ReliaDility 



I. 



Score distributions that are skewed are desirable, 

e.g.: 



pre 

instruction 




post 

instruction 




if the group tested includes both preinstructed and 
instructed students, distribution should be U-shaped, 
e.g.: 



mixed 




39 



Test-retest coefficients should he high for each ob- 
jective in a mixed sample (as alcove). 
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Iiileinal CDiisistciK'N (.-oettk loiits nIhuiM bo sultsUiii- 
tKil Uw c;kIi soore. 



CunteiU ValidU\ 



Ciintenl coverage and 0Illplla^l^ should be judged 
adcqiialc. 

FmI of items lo Uieir inleiidcd conlen! categ(ir\ is 
matter of judgnient. 



Inleinal coiisisioikn coefticieiits slunild he liigb loi 
eaeb oliiecti\e iii a in.xed sample. 



Adet|uae\* td" coverage of bebavior specified b\' ob 
)Octive should be adequate. 
• t*it of items to their intended content category is 
a mattei of judgment . 



Consi I " ! Vabdit\- 



Scores show growth during years of school alt: 
dunce. 

Scores show greatest growth durijig years ol' : . - 
vani instruction. 

Groups with more training average better th.'.-^ 
groups with less. 

High scoring students can more often solve proi \cu) . 
requiring the knowledge than low scoring studcntN 
Relationships among items should correspond (show 
patterns) lo relationships among consent categories 
(e.g., results of factor analyses should be logical). 



Scores for objectives exlu[)il sensitivity to instruc- 
tion, i.e.. change from wrong to right after effective 
instruction. 

Items for one objective are more closely related 
than across objectives. 

General background plays less role than in norm- 
referenced tests ( this implies less cultural bias). 
High scoring suidents can more often solve pro- 
blems requiring the knowledge than low sctning 
students. 

Relationships among items should correspond 
(show patterns) lo rrlaiionships among content 
categories (e.g.. results o( factor analyses should be 
logical). 

Where a learning hierarchy is known to exist per- 
forma.ice on higher objectives will predict perfor- 
mance on lower order objectives, and demonstrated 
mastery of lower order objectives facilitates learn- 
ing of higher order objectives (e.g.. positive vertical 
transfer). 



Criterion Related Validity 



Scores correlate well with other measures of achieve- ! . 

ment such as teachers' marks atid other tests. 
Scores predict performance m class or on tasks de- 
pendent on capabilities being measured. 



Uses 



Assessment of status of school system ((n classes or 
students) with respect to achievement m basic skills 
and content areas. 

Program evaluatmn for (Miicnmes of h^ig-ierm 
growth (at least h montlis) towards major goals. 
Selection and placement of students in courses and 
programs on the basis o\' level of basic skills or gen- 
eral knowledge of content. 
InioimalHMi fui curriculum planning. 
Monitoring yearly progress nt schools and scluu)! 
systems with lespccl to goals. 



Scores correlate well with other measures of the 
objective. 



Assessment of status of students (or classes or 
school system) with respect to curriculum 
objectives. 

Program evaluatitm tor long- or short-ieim attain- 
ment of specific objectives. 
Diagnosis of mstriictional needs ot' intlividual stu- 
dents and groups of students, 
hitormalion tor planning ot classrootn m^tvuclion. 
Monitoring piogress n\ stiulents with respect to 
mstiuctional ohuv i i .es. 



Appropiiale and l.fteclivc Uses 

There i> .■ wide varie!\ of \\.j\s diu' ina\ design lesinig programs using norm-rej'erenced - "h'lion-referenced tesi*-. liecause 
the piissibilmos are so many. \s e siiall desci ihe j partiadar appn^ach now being I (led iii tlir ( can Vww School Disti icl n| 
Oiange ( (nint\ . Wo believe ii is a conceptualK sound appioach and i^ teasible in at least sonic systems. Other approaches 
equally eftectne un(|iiestiiMuilii\ exist. Although oiii purpose is to illustraie possibilnies. n()t prescnbe. we do have sv)me con- 
viciiouN ab»)ui (Os! ptotiranis u Inch this apniojcli rotlects. 

40 
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Ill pjrlictiLir . we believe llut in ihis j^e nt iiklividujli/ed msUiiLiiuiijI [iiop.iiiis slIiodI ilisliiels (lirou^Iiout Cilitnniia NlhuiM 
he usinji h<»i!i kiiuls ni tests in ;i LomplemenMiv 'ense I Ins k heciiise - it ihe Lir^e vjrieiv .iiul nujniiiy »>t inhxnialinn lluil is 
neeileil it' sueli pri)^'r;iins aie ^mii^ lo tiiiuln'ii jileijiijlelv . On ihe mic hinul seliniil iuJinnnsli.ilDrs neeil i<i kiinvv Uow wcW 
I heir situlenis .ue pert, tin nnp Lnlleeli\el\ in ilie h.iMC skill areas ms .1 vis a henelunark iridependeril 4)1 ilieii liisiiiel. Without 
this rnlnrrnaliun u is not ptissihle tm ihein h> lefiori respmiMhK in the alioMl ln»aid. lo paienis.or to ihe pnhlk. B\ the same 
lokeii wiihoui Sikh inlorinaiiDii these lallei punps eaiumi make resp(Hisilile ilecisniiis ahoiii the polieies and support thc\ 
must provide the st lnM)|s Willmui this intn,inatioii program planning. <>i levisnniand imfirovement is liaiidieapped. 

On tlie other hand, a dittereiil sort of mtorniaiion is needed tor pijiinm^ and evahialing iristriietioiial aeifvilies. Sue adminis- 
trators JtuI profiram superMsors bill espeeially teachers and siiidenis need sjieeit ic tntornrat loii ahoul siudeni aelnevenicnl of 
mslruetional nhjeelnes, Hii^ht siiulents. sometimes helped bs well ediieaied paicnis, 4)l'len figure out for ihenisels es jiisl 
where they ^tand but maii> siiulents need unambiguous leedbaek from the leaelier about their eltorts to learn. Teachers wlio 
trv serrnuslv to do this on some real basis other than lule of thumb need help in getting this tritormation especially if I [icy arc 
trying (o nuliMdiiah/e iiistiuction. 

We believe that tests aie the nu)st leliable . valid and eti'icient souice of much (altliougfi not all) of these kinds of intormation. 
( aretully constructed and validated tests properly used *.an provide inlormatiiin good enough to say that the time and money 
recpiired tt) administer the tests is worthwhile and that the pr»)gram or instructional decisions arc more likely to he correct 
than if the decisu>n maker chose not to test. Ihus district adtiiinistralors. site administrators, and teachers all need lo know 
how t<» use tests eltectively. Tlicy need to know how to proceed efficiently and how not to interfere with each other's 
efforts i»r wiih stmlent leaimng. The Ocean View prt)gram is intended lo achieve this condilion. 

I he Ocean View Program 

Ocean View Schonl f)islricl has liad a well established commitment to tests and progr.rn e\ali.ulion for a number of years. 
I loiii to p;74. despite shrinking budget options, tlie district has expended between S.^ uiid S5 per student per year on 
tests and related evaluation activities ti) meet the consistent and insistent demands for cognitive student information. The 
school ilistiict has ciMisistently exceeded stale reijiiirements for testing for the benefit »)1 its students, teachers, administrators, 
parent' school l>oard. and taxpayers. 

In Ocean View as elsewhere tliesiliool boaril has the responsibility to identify the general |iarameters of Ihe curriculum. Work- 
ing Iroin these general guidelines, cui i iculum directions and strategies must be identified at the administrative and teaching 
level. ('< iicoimtantly , evaluation instruments and strategies are identified to fit the curriculum parameters ailoplcd by the 
hoard. I ike most school (hst i icts. Ocean View uses a norni-icferenced lest logaina broad picture of how well tlie district is 
ai-hieviiig 111 Its cuiiiculum in the basic skills aieas ot reading, language, and mathematics. In those areas in which the district 
is liorng well , it is assumed that disliict teaching ettoits are effectively meet itig I he goals of the educational program. If. on 
tlie otht t liainl, the distru 1 shows a weakness, an m deplh study (»t the problem is then initiated. A norm-referenced item 
.iiiaKMscin be part or siu li a study, however, when one iieeils to translate identified group program weakness itito mdividua! 
stiideiir piotih's loi coiiei tive action, a critenon retereiiced test is the l)est measure. 

1 lu-ie|( nr. 1 he disiiici prngiam mcludes ciitein)n-ieleience(j tests, ('iit.'iioii'refcienced tests with llieir inteiiin tests* can 
.issrst te.iciuMs. leg.irdless t)| giade level . to know on a legular basis how well a student is doing in relation lo subiect area ob- 
p'clrves Cr Iter lonnMerenced tests indicate specific student eilucational deficits or strengths on which the teacher can base a 
pros', npliori of daily educational tasks lor that sttulent. Individual inst i net loiial programs can be derived from criterion-refer* 
eiiced lest data it tin* ob)et lives which tlu' test measiiies are explicitly those in tlie curriculum ))lan. In this sense leadiers 
UMch to the tcsr 

In slioj I . ( )( e.m View li.is < oiichiited that a basic test program should include both noini-ietetenced and criierion-reterenced 
te-.rs I he district uses one ol the tt.iditioii.il noiiii reliMeiiced aclin.'vemeut l)atteiies in Orailes 1 through H to obtain a broad 
(ih ture lor its program cv. ihi.it ion . I he intention is to use ciiteiion relerenced tests for iliagnostic/prescriplive purposes with 
individu.il .nideius In the area ol in.iiliem.ilics. .i piiblishcil (.'iileiion'reterenced test has been ailopied and is now m use. The 
le-.t M'-.uli'. .lie ii'.e'l to tiaiislaic geiieial noim-ielcienced mloimation into a mi'amngliil instructional pictuie tor both the 
le.h her .11 id -are ad nil ri isi i .it oi In icidtng. the distt u i ciitiMion measure men I coniinitiee is ciiiieiitly exploring various pub* 
hsiied piogiaiiis ih.it t an br adapted to meet Ot eaii View's needs. 

Mrii'Mim h'-.l .u«- JiMii qui//i's dca^'iird to iiKMMih' a Miigle objective and aie usually parallel to then cmit'sponding part in a 
inoi.' I . Miipo'lirii > iii.'i h 'II trlchUM'd tc .1 1 lh'\ af ii .etiil li»i iiioiiiioiiiig ihc da ll\ of ueel.K pioprs', id individual 
hnlf itis 
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l)lt(K•ulllo^ 111 1 litv UNO I 'sa^o 

Ther.; jro p|nhlc^l^ jicnorjlcd hy .iii ovtoiiMVc tcshnii pioguiii. One is iho recurring protcssional argument as lo whether a 
ninn.-reteicnced adiicvciiieiu lest should rciiulate tlie eiirricuhnii. Most say tt shcnild iioi. Nevertheless, if one is alteiiipling 
lo Miea>ure aehieveineni . ihc test eoriteiit nui.si have a close lelatioiiship to the cur riciiluiii taught, in fact, the Ocean View 
experience sngge^l^ llial il is not loo haid lo uet a Mibslantial degree of consensus from !eaclieri concerinng the appropriate- 
ne^^of mnsi of the items fouiul in the ha^lc ^kll^ lnea^u^e^ used. When teacheis question whether a iiiven achievement lest is 
regulaliiivi the cuuiculunK il is helptui lo a^k them to evamine ihe paiticulai Hems and the ^kills Uk> are intended to measure. 
Specific dl^agleemelllsablMll ielc\aiKe and appropiiateness are rarely fouml. Skills such as knowing how lo identify main 
ideas, how lo use punctiuiHoii properK or how to perform basic anthmelic opeiatioiis are prelty nearly universally accepted. 
The broad geneial onienl of a iioi m rcferenced test is so common in agreement vis a vis .ippropriate curiiculimi aieas m pub- 
lic schonis thai gelling agreenieni is not difficult vslien the time is taken to do so. 

Hsuig both kinds of lesls creates a second problem, namely iiilerrelating the results of the two kinds o|' itistiuments. The iwo 
do \ icid sonie\\hal dilferent kinds nf mfoimaiiun; an overall summaiive judgment on the one hand and a specilic diagnostic 
picinre on the other. It isimpoiuint to help personnel see the iniei lelaUonslnps among the two sets of data and not come to 
teellhe\ aie i .oiillicl. I his also calls for careful examination of lesl materials by teachers and adininistralors. Bycaiefiilly 
deleiinimiig coriespniidences and differences in content and emphasis, ciiterion-refereiued tests can be used to translate gen- 
ei.il noim-reterenced intorniation into a meaningful instructional picture. The strengtfis and weaknesses of both kinds ol iiitor- 
maliori can be exhibited when this is done. It is evident that administrators must help teachers in this task. 
Ihe ihiid pinblcm nccurs iii ilie subsequent task of lelating tlie information to instructional activities. Tiie (luantity ol infor- 
niahoii IS huge and leacliers caiinni use ii willmut help. \ paiticular need aiises (o assist leadiers in a suppoiiive sense as ihey 
galliei materials and ei|Uipinerit and urgani/e tlieii students for iiist i uction . 

I inally we may add that v,lien a large degree of usefulness of the distiict teNtine progiain is developed the ph)blems creaied by 
external lesiing demands aie reduced. This k because sludents and teachers (a) perceive tests as iiaving some Use and value .md 
(b) lend lo midersi.md luoic liiiK what llie d.ila do and do not mean \Mlh a consequenl lednction in teat and distaste. 

(oilcUlslOU 



RIC 



ill biiel. Ihe kev In ihc elfeclive Use of tests k a balanced iise of each type nf lost. Noi mreferenced tests are used best toi 
gioup tkvisiuiis. ( iiierion iefeienced tests aie used best fur imlividiial diaeiioNtic- piesci qitive decisions. 

When exahialioii quest ions ai isc concerning compaiisons of gioups of sindenis toi purposes of making geneial cnriiculum deci- 
Moiis. iinrm ieleieiKcd lesls aie usually iiiosi appiopiiate. When evalnalion (luesliuiis aiise conceining the eilucalional piouiess 
of a Niudeni t..r piii puses of preseiibmg an indiMdnah/ed inst met imial progiam. ci iter lon-ieteien-.ed tes! ne usual;. !iiosi 
appiopnale . 
Ki I I Kl N( I S 

Canei.R I' 1 w.. dimenM.uis ui -ei i's\ chonieinc and eihimeliic. Aineiican INy chohigist , r)74. l^K 5 l2o 1^- 
I bel.K.l. Ciiieiton-ieteieiKed ine.i anement , 1 iiml at i- nis. SchonI Ke\iew. I'J7 1 . 7»J. ^S^-^KS. 

(.lasei.U .'vNilko.A J M easii lenieii I in le.ii mug and nisi i ucl n ui . In K. I., ihoindike (1.^1.). I.dnca i imuil MeasuiemenL 

Wasliiiigloii. I) ( Anieinaii ( oiiiicil on I ducaiion. I')? 1 . 
Haiiibieiuii. K K Si \n\kk. M K i o\said an inieutalh.n of tlieoiy and method for c i itei loirrf'f ei eiiced tests, lom nal n| 

1 du..itioiKil Mea aiiemeni . 1^7 >. K). \[yi 1 M, 
H.iiiis ( \V Alkni. M r,''^U*opli;iiii. W I (Ids) rinblenv. Ill ciileiiuiiieteienced measiiiement. -'P'' ■'^^^"^'^ 

1 \ ilii.iiiun \innhei ^ C'litei for ihe SimK ..t I salu.ilhni. \'u\\ nf ( alifninia. I o^ Angeles. h)74. 

W \^l\\^ell, K ib.'hl. (. . Sen^lMll, ]}. Si I undm. S. Dom.iin lefeieiked cmiiciilum evalualion, A I ethnical hand- 

h,H,k .iiula..ie h . • ml he M I \ MM A S i pin,eci. ( Si Mnmunaph .Seiies m 1 vahiat imi . Ninnbei 1, Cenlei loi Hie 

Siiid\ ..t I V ihi iiiMii I niNi'i al\ nt ( .ihtutnia. I o-. Anuelcs. 1^7 > 
khiM S IV\Kn,'vnM,l Ii Isaies and poueduies ni Ihe de\elnpnieni ol . in.'iiMiMefeienced lesls, l UK lAI Repnil :_(.. 

p,,,,.i.|..n M IKK ( leaiinrhnn ." nil I e >ls. Me.isuiemeiit . \ 1 \ahial mn . I o / ; 
f^,,,.,..tt I \\ X Klein ,S I* III imMiMn.il ,eii/iiMt\ si atisl ics appi opi lale tm nb)ect ives-bas.-d I est Hems. (SI Krpnit.No, 

-J I ( ,.,,1. • iM Hp- Si hh nt 1 \ ,il!i.iii('ii liiKei n| ( .ililnm i.i . 1 <• . An^ieles. TT/d, 
p,.p),„,, \^ I ,1,1 , ( Mi-Minn lei.Meii.rd \le,,an.Miu'iH \n 1 nil ndii. iim, , I nelewnnd ( IH I s. N.I l d ucal mn.il I linnlneN 

I' il.li. .111-. II . In. 1'^ • I 

(, 1 I,.-.., („>n In; ..nh'iiMti i. t 'hui.ed le,i, P.pei pn-seiiled al llie annual ineelinL' nl llie Amukaii 1 .In- 

,., ,|, ,1 I. \ .M I iii>M, ( iilr Ml . I .■liiiMi;. I'' ■ 

S,,p,t. ( W ( iii.-riMi, ,.| M,.|,..'d .1 ' ■.iiieni ;ii>.ailiia i m iiniin i el ei eih ed iiie.i .uiemeii i ^'M'J^Jil^l'li^;:^'^^ 

•".t„.r . ,t Report'/' .ire intenrJer) to pfc.-,«nt inform.ition of a prnctical value to sctiool administrators in Cahfornia. 
It '.iHK.ni \u' rpcooni/ed that (.i) the applicabihty and value of such information may vary from district to district 
,M .1 Mall- as diverse .is Cahforrna, and (b) the viewpoints expressed m "Special Reports" are those of the authors 
Md not ncTPSSdrily those of the Association of California School Administrators. 





Having read the Special Report of the Association of California School 
Administrators on "The Nature and Uses of Criterion-Referenced and Norm- 
Referenced Achievement Tests," complete the following exercise by decid- 
ing whether the characteristics indicated describe a norm-referenced test 
or a criterion-referenced test. Mark an "X" at the appropriate choice. 

a. With this type of test, the test items refer only to the specific ob- 
jectives for which they were written. 

a. norm-referenced test 

b. criterion-referenced test 



b. With this type of test, the best test items are those that discrimin- 
ate between individuals who score high and individuals who score low 
on the test. 

a. norm-referenced test 

b. criterion-referenced test 



c With this type of test, the number of test items per topic is deter- 
mined on the basis of the relative importance of that topic. 

a. norm-referenced test 

b. criterion-referenced test 



d. This type of test determines whether or not students have achieved 
specific instructional objectives. 

a. norm-referenced test 

b. criterion-referenced test 



e. This type of test determines a student's status with respect to 

other students in the achievement of basic skills and content areas. 

a. norm-referenced test 

b. criterion-referenced test 



f. With this type of test, meaningful total scores on all test items 
are usually net possible. 

^a. norm-referenced test 

b. criterion-referenced test 
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6. Read the Popham and Husek article, "Implications of Criterion-Referenced 
Measurement" in Popham, Criterion-Referenced Measurement : An Introduc- 
tion . Then complete the following exercise by distinguishing between 
norm-referenced measurement (NRM) and criterion-referenced measurement 
(CRM) on the bases indicated below. 

a. VARIABILITY 
CRM and NP.M 

b. ITEM CONSTRUCTION 
CRM and NRM 

c. REiLIABILITY 
CRM and NRM 

d. VALIDITY 
CRM and NRM 

e. ITEM ANALYSIS 
CRM and NRM 

f. REPORTING AND INTERPRETATION 
CRM and NRM 

(See Appendix A for possible answers. ) 
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Goal 9.2 



Content Outline 



Activities-Resources 




A. Measuring Instruments for the Cognitive Domain 

1. Written tests, whether they be teacher-made 
or standardized, play a central role in the 
testing of knowledge. 

2. According to Butler, there are three basic 
types of criterion test items that can be 
used to test knowledge. These types are: 

a. "A test item may be directive or 
imperative. For example, 'Find the 
value of R in an electrical circuit if 
1=30 amperes e d E=110 volts.' 

b. An item may be a completion type, 
requiring the student to select from 
several possible choices the one 

that he thinks will correctly complete 
the stem of the item. For example, 
'Excessive backlash in the differential 
assembly of an automobile would most 
likely be caused by'... (followed by a 
blank space or four alternative responses). 



An item may ask a direct vquestion. 
For example, 'What three types of 
meter functions are combined in a 
inuUinieter?'" (5). 



(5) Instructional 
Systems Devel - 
opmerit for V oca - 
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Content Outline (continued) 



3. These three basic types of criterion test 
items can be adapted to measure all the 
different kinds of acquired knowledge. As 
Butler points out, however, it is important to 
measure the students' knowledge by testing 
their ability to apply that knowledge to the 
problems they will encounter on the job, 
rather tha.i by the mere recall of isolated 
1acts. To do this, Butler strongly 
recommends the method of providing hypotheti- 
cal situations and then asking practical, 
objective questions about courses of action 
that should be taken in the situations (5). 

B. Measuring Instruments for the Affective Domain 

K According to Pucel and Knaak, attitude 
measurement is one of the more complex 
types of measurement. People have attempted 
to measure attitudes for years and have 
developed very complex assessment procedures 
that have had only minimal success (21). 

2. According to Armstrong et al_. , "Perhaps 
the most appropriate instruments for 
measuring development toward course 
objectives in the affective domain are 
scales and techniques developed by the 
classroom teacher. By carefully following 
suggestions of experts in the area of 
measurement and with practice, the teacher 



(5) Instructional 
Systems Devel - 
opment for Voca - 
tional and Tech- 
nical Training , 

pTTgs. 



(21 ) Individual izi nq 
Vocational and 
Technical 



I nstructio n, 
p. 186. 
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Content Outline (continued) 



may become quite proficient in this skill. 
Some of the more usable and reliable tech- 
niques include Likert Scales, Semantic 
Differential Scales, Sociometric Techniques, 
Rating Scales, and Behavior Checklists" (1). 

3. In the measurement of affective behavior, 
teacher observation and teacher judgment are 
also utilized. Teacher observation is 
described as a technique that systematically 
categorizes the behavior under consideration. 

4. Teacher judgment can be utilized if the 
teacher constructs a rating scale or a check- 
list to be used in determining if the 
behaviors under consideration are being 
exfiibited according to a given set of 
criteria . 

5. Although teacher observation and judgment 

c.e common ways to measure affective behavior, 
SL'bjectivity tends to be a very critical 
Dro')lem (1).* 

Measuy nr Instruments for the Ps ychomotor Domain 

1. According to Armstrong et al^. , "Measurements 
ir ' e cognitive and affective domains of 

lavlor assess what might be called internal 
lA.iaviors. However, since the psychomotor 
domain deals primarily with external behav- 
iors, the measuring techniques differ in 
some respects from those used in the cognitive 
find affective domains. Basically, this 
difference involves how ttie responses to the 
measuring instrument are recorded. Rather 
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The Development 
and Evaluation 
of Behavioral 



Objectives , 
p. 69. 



(1 ) As above, 
p. 67. 

* See Discussion 
Question D in 
Part III. 



Content Outline (continued) 



than having the individual respond to the 
instrument directly as in the cognitive 
and affective domains, usually another 
person is required to observe this individual 
performing the given psychomotor skills under 
consideration and then record the observed 
performance" (1 ) . 

Measuring instruments that are available in 
the psychomotor domain include observation 
systems, rating scales, and checklists. 



(1 ) The Development 
and Evaluation 
of Behavioral 
Objectives , 
p. 80. 



D. Types of Wri tten Test Items 

1. Basically, written test items can be classified 
as either objective or essay . 

2. Objective questions present the learner with 
a very structured situation that limits the 
type of response he makes. He must either 
select the correct answer from several 
alternatives, supply the correct answer, or 
determine the truth or falsity of a given 
statement. Types of objective test items 
include: 

a. multiple-choice items; 

b. matching items; 

c. true-false items; 

d. completion items. 

in general, objective test items are easier 
to administer, quicker to score, and provide 
more objective results than essay items. 
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Content Outline (continued) 



3. Essay questions, on the other hand, measure 
the student's ability to select, organize, 
and integrate ideas. According to Butler, 
"Because the goal of criterion testing is 
to determine whether the student can meet 
the requirements of the objective and 
not whether he can write extensively and 
well, essay items have little or no 
place in a criterion test. This fact, 
coupled with the need to score the tests 
objectively, should lead you to reject 
essay items in most cases" (5). 

The Perfo nnance Test 

1. The assessment of student performance in 
vocational education may take two forms: 

a. the performance test, and 

b. product evaluation. 

2. The Performance Test . A performance test is 
a test that requires a student to accomplish 
a job-like task under controlled conditions, 
controlled conditions meaning those that 
will give the student the best possible 
chance to display the skill the test is 

to measure, and those th:a do not ch?.jgc from 

one student to another (5). 

a. Performance tosts are used when tht- 

instructor is interested in determin^no 
if the student can perform the jorrf^cv 



(5) Instructional 
Systems Devel - 
opment for Voca- 
tit '^al and Tech- 
nical Training , 
p7l99. For a 
more detailed 
discussion of 
the advantages 
and disadvan- 
tages of objec- 
tive items (and 
the various 
types of objec- 
tive items) and 
essay items see: 
(21) Indiv i- 
dualizing Voca- 
tional and"^Tq"ch' 
nical Instruc- 
tion , Chap. 6; 
(9) Designing 
Objectiv e, Essay , 
and Perfonnance 
Tests ; and (8) " 
Home Economics 
Evr» luation , 
Chaps. 6-10. 



(5) As above, 
p. 127. 



4i) 



Content Outline (continued) 



prUCcbb driQ IT rlt: Ldil ^^rUUULc Lrlc LUiicLL 




n V* /*{ 1 1 O 

proQUC L . 




D . 1 rlc Lcb L r cCjU licb Lrlc liloLrULLUr tU 




ODserve Lritf SLUQeriL ab nt^ ib Luiiip i tjc i ny 




the process; therefore, the instructor 




Lail dlbU UcLcnilirir: IT Lric pi UUUL. L 1 b 








c. bince Lnese lgsls must oe aaimuibLerea on 




a one-LO-on^i uas i s , tney presen l cer lq i n 




aiTTiculties tor the insurucLor, wno 




cdnnuL De rebpunbive lu ULritfr bLuutfriLb ni 
the class at the time ot pertormance 


* See Discussion 
Question E in 


testi ng .* 


Part III 

rcii L 111. 


i. rroduct bvaiuation. ri oauct evaluation 




is used when the instructor is primarily 




interested in whether the student can 




produce the correct product. It doesn't 




allow the instructor to assess tne 




process that created the product--that is, 




the correct proauct may oe compietea oy d 




correct or an incorrect process- It does. 




however, free the instructor to evaluate 




the performance of students after class or 




during periods when students do not require 




assistance. 




4. Basic procedures for developing a performance 




evaluation instrument are: 




a. Specify the objective. 




b. Determine if you want to evaluate the 




performance with a performance test 




or a product evaluation. 
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Content Outline (continued) 



c. If a performance test is used, list the 
procedural steps. If a product evaluation 
is used, list the points to be observed 
after the performance is completed. 

(Make sure that the steps or points are 
independent, that each contains only 
one performance, that each begins with 
a verb indicating the behavior expected 
of the student, and that all steps are 
listed. ) 

d. Identify critical items. 

e. Determine if you need instructor check- 
points when using a product evaluation. 

f. Determine the criteria for judging 

satisfactory completion of each step. (21) Individualizin g 

, . . , , , , Vocational and 

g. Establish the acceptable mastery level Technical Instruc- 

score for the instrument (21). tion , p. 185. 
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study Activities 

Based on your reading of the content outline and any additional references 
as suggested^ complete the following activities. 



Techniques for Assessing Instructional Objectives iii the Domains of 
Learning (22) 

The following material enumerates some techniques that might be useful 
in assessing student achievement of instructional objectives in the three 
domains of learning: cognitive, affective, and psychomotor. 

Assessment Techniques for the Cognitive Domain 

The following techniques are appropriate for assessing student achieve- 
ment of instructional objectives concerned with course content and fac- 
tual information: 

1. noting written or oral response to selected questions or issues 
listed in a pre-test or an exit test; 

2. using teacher-made written tests consisting of objective-type 
questions ; 

3. having students prepare a short paper or essay with standards 
and criteria for assessment; 

4. having a student chair or serve as a member of a committee, 
preparing and presenting a report on some aspect of a unit of 
instruction; 

5. assessing a student's response to questions raised by an in- 
structor in a group instruction review. 

Assessment Techniq ues for the Affective Doma i n 

The following technique is appropriate for assessing student achievement 
of instructional objectives concerned with the interests, attitudes, 
appreciations, and adjustments of the learner: using an attitude check- 
list that specifies bei.avioral criteria for judging student achievement of 
appropriate job-related attitudes. 
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Assessment Techniques for the Psychomotor Domain 

The following techniques are appropriate for assessing student achieve- 
ment of instructional objectives concerned with motor skills: 

1. observing the student as he demonstrates a skill or the 
application of knowledge; 

2. assessing a finished product that required the use of the 
psychomotor skills being assessed; 

3. using a performance test in which the student demonstrates 
the psychomotor ability as part of the test. 

1. Complete the following multiple-choice questions by marking an "X" by 
the specific learning domain being tested by the assessment technique 
described. 



As part of a performance test, an instructor observes a student to 
determine whether or not he is able to operate a power saw, follow- 
ing correct procedures. 

^a. cognitive domain 

b. affective domain 

c. psychomotor domain 

A teacher prepares a series of multiple-choice questions to test the 
student's knowledge of the various types and uses of power saws. 

^a. cognitive domain 

^b. affective domain 

^c. psychomotor domain 

With a list of characteristics that describe a safety conscious in- 
dividual, an instructor observes a group of students working in a 
wood shop to determine whether or not these students are safety 
conscious. 

^a. cognitive domain 

^b, affective donain 

c. psychomotor domain 
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d. As part of a performance test, an instructor observes a group of re- 
tail sales trainees in a roleplaying session to determine whether or 
not they are able to establish good rapport with customers. 

a. cognitive domain 

^b. affective domain 

^c. psychomotor domain 

e. An instructor selects a standardized test consisting of matching 
items that determines whether or not dental assisting students are 
able to identify basic tools used by the dentist. 

a. cognitive domain 

^b. affective domain 

^c. psychomotor domain 

Typ e_s of Tests 

In vocational education, there are two basic types o^" tests: 

1 . written tests, and 

2. performance tests. 

Written tests are designed to measure achievement of objectives primar- 
ily in the cognitive domain. Performance tests are designed to measure 
achievement of objectives primarily in the psychomotor domain. However, 
aspects of these tests may also be used to measure achievement of ob- 
jectives in the affective domain. Thei^^- primary use is our major con- 
vrrn here. 

Written tests consist of test questions that can be classified into two 
major groups : 

1. objective test questions, and 

2. subjective (essay) test questions. 

Both basic types of tests, written tests and performance tests, may be 
either standardized or nons tandardi zed; that is, they may be purchased 
from a commercial test publisher, or they may be prepared by the indivi- 
dual instructor--teacher-made. 
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Keep these distinctions in mind as you proceed through this module. 

Objective Tests and Subjective ^ests: Advantages and Limitations (9) 

1. An objective test is a type of test so designed that the score can 
be determined objectively and will be essentially the same regard- 
less of who determines the score. Typical objective test questions 
include true-false, multiple choice, completion, matching, and pic- 
torial recall. Objective tests may and should use more than one 
type of question. Each type of question has its own uniqua merits 
and limitations. 

True - False 

This type of test is generally inferior to ^ther types n'nce 
the element of "guessing'' is always present. Remember, a per- 
son who knows absolutely nothing about the subject will aver- 
age 50 percent correct by just answering all the questions. 
Furthermore, educators claim that even suggesting a negative 
answer is a poor practice in teaching. 

If true-false tests are used, there should be a relatively 
large number of questions, and there should be approximately 
an equal number of true questions and false questions. The 
student should be required to place a circle aroun»j the "T'' 
or "F" in the corresponding right-hand column. For scoring, 
the number of incorrect answers may be subtracted from the 
number of correct answers. 

Advantages : 

(1) Comparatively easy to construct 

(2) May be applied to a wide range of subject matter 

(3) Objective and easy to score using a key 

(4) Permits a wide sampling of knowledge in a unit o-^" vork 
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Disadvantages: 

(1) Includes negative suggestion 

(2) Guessing factor is 50-50. Modifying corrects for this 
factor but the modification techniques usually confuse 
students 

Multiple Choice 

In this type of test the student must select the most appro- 
priate answer from a minimum of four possible answers. Care 
should be taken to avoid more than one possible correct answer 
in the one-correct-answer type or more than one possible in- 
correct answer in the reverse multiple ^:hoice type. 

Advantages : 

(1) Tests judgment, reasoning, and discrimination of students 

(2) Tests more than memc y for factual knowledge (tests by 
recognition rather than recall) 

(3) Very adaptable to who, what, when and where situations 

(4) Reduces guessing factor from one-half to one-quarter 

Disadvantages: 

(1 ) practically none 

(2) Initial construction pf multiple choice items is time- 
consuming but this factor is offset by usefulness of 
questions 

Completion 

Tb-i<^ type of test requires students to supply the answer to 
an incomplete statement or question by recalling one or two 
words, numbers, dates, or symbols. This type of testing re- 
quires that the student supply the exact answer intended. For 

example: "A letter that has a descend er is Avoid 

ambiguous statements. 
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Advantages: 

(1 ) Tests memory 

(2) Stimulates study habits 

(3) Eliminates guessing 

Disadvantages: 

(1) Not a good measurement of student knowledge 
(emotional factors involved in test writing, 
i.e., fear, tension, nervousness) 

(2) More difficult to score 

(3) Measures only factual knowledge 

Jiina 

A matching test is one that consists of matching words in one 
column with a closely related word or words in scrambled or- 
der in a second column. If for no other reason, they are 
used to add a certain amount of variety and interest to the 
otherwise boring task of taking a test. 

Advantages : 

(1) Comparatively easy to construct 

(2) Objective and easy to score 

(3) Efficient as a space and time saver 

(4) When properly constructed, the guessing factor can 
be practically eliminated 

Disadvantages: 

(1) Inferior to multiple choice items for measuring 
judgment and application--apt to stress memoriza- 
tion of facts 

(2) Unless properly constructed may include irrelevant 
clues to correct response 

(3) Unless skillfully prepared may be time-consuming to 
student 
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e. Pictorial Recall 

Identification tests, in which various parts of a drawing are 
to be identified, not only have an interest value, but are 
also quite effective for testing nomenclature for tasks, tools, 
materials, and parts of objects. 

Advantages : 

(1 ) Tests memory 

(2) Stimulates study habits 

(3) Eliminates guessing 

(4) Easy to score 

Disadvantages: 

(1) Measures only factual knowledge 

(2) Emotional factors ore involved in writing tests, i.e., 
fear, tension, nervousness 

A su bjective test , such as an essay test, is one that is scored 
on the basis of the scorer's personal judgment of the worth of 
each answer. Essay type questions are fairly easy to prepare and 
are adaptable to most subjects and most classroom conditions. The 
chief disadvantage is that they are hard to score fairly. This is 
because grading is based chiefly on opinion, which may be influ- 
enced by neatness, "literary" ability rather than subject matter, 
or personality conflicts between the student and instructor. 

Advantages: 

(1) Measures student's ability to organize his/her thoughts 
and express himself/herself clearly 

(2) Takes a comparatively short time to prepare 
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Disadvantages : 

(1) Time consuming to score 

(2) Difficult to score objectively 

(3) Time-consuming for student to write 

(4) Offers poor coverage of area to be tested 

(5) Penalizes the student who is unable to express himself/ 
herself well 

(6) Lacks reliability 

The Requi rements of Good Tests (9) 

A test is only as good as its results. In other words, if a test is 
"good," it is good because it accomplishes its purpose effectively and 
economically in a particular situation. Therefore, a good test is one 
that is objective , va1 id , reliable , comprehensive , and provides for 
economy of time in giving and scoring . Analyze these qualities care- 
fully. 

1. Objective 

When a test can be used by two or more examiners of equal compe- 
tence and give identical or similar scores, it is said to have 
objectivity. It is a quality dependent on purely impersonal, fac- 
tual evidence rather than on judgment, personal opinion, or bias. 
Objectivity , therefore , applies to the giving and scoring of a^ 
test and not to the person taking the test. 

2. Valid 

When a_ test measures -.'hat it is intended to measure, vt i_s said to 
have validity . Validity requires careful selection of test items 
to avoid irrelevant and nonessential questions that are not true 
measures of knowledge or ability. Every term in the test should 
be representative of the main purpose of the unit of study being 
tested. 
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Reliable 

A test is said to have reliability when it gives consistent results 
whether given at different intervals to the same group or given to 
different groups who have received the same instruction. Reliabil- 
ity , therefore , refers to tjre accuracy with which a^ test measures 
the things that vt i_s supposed to measure . 

Comprehensive 

A test should provide adequate coverage of tlie subject or that part 
of the subject to be tested . The questions should cover all the 
points emphasized in the lesson. Written tests, such as the old 
essay type, have only a few questions and are obviously not compre- 
hensive. 

Convenient 

A test should be easy to use and should provide for economy of time 
in administering and scoring. Its construction should be such that 
it is possible to test a larger number of items in a class period, 
and the instructor is able to score a larger number of tests with 
true objectivity. 

Having read the preceding material on "Types of Tests," "Objective 
Tests and Subjective Tests: Advantages and Limitations," and "The 
Requirements of Good Tests," fill in the chart on the following page, 
indicating the advantages and limitations of objective tests and 
subjective tests for the criteria indicated. 
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Objective Tests 



Advantages 



2. Validity 



I 
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Limitations 



Subjective Tests 



Advantages 



3. Reliability 



Limitations 
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..Ji :ii ,c Tests 



4. Comprehen- 



siveness 



5. Convenience 



Limtations 



Subjective Tests 



Advantages 



Limitations 



In Ntodule 7, D erivation and Speci fication of Instructional Objectives , 
you wrote instructional objectives fur specfic occupational tasks. 
(See the Objectives Specification Sheet you completed on page 53 of th 
Study Guide for Module 7.) In Module 8, Development of Instructional 
Materials , you selected instructional strategies for accomplishing thes 
objectives. (See the Selection of Instructional Strategies forms you 
coQipleted on page 30 of the Studv Guide for Module 8.) Now you will 
have the opportunity to select c' sessment approaches/techniques for the 
objectives you specified in Module Be sure you have objectives 
representing the three domains of learning: cognitive, affective, and 
psychomotor. 

Look at each of your objectives and select an approach/technique to 
assess students' mastery of that objective. Use a form like the one 
provided on the next page to write the object-^ve and the corresponding 
assessment technique. (You will need to prepare a form for each 
objective. Adequate space is not provided in this guide, so use 
additional sheets of paper as necessary.) 

SELECTION OF ASSESSMENT APPROACHES/TECHNIQUES 



Objective: 



Domain: 



Assessment Approach/Technique: 
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Goal 9.3 



Content Outline 



Activities-Resources 




Construct Test Instruments for Measuring yy 

Student Achievement of Instructional /x/ 

/vv/ Objectives . /Y/ 

m^/////y///////////^^^^^^ 



Test Construction 

Test construction is a highly specialized subject, 
and many texts exist on the subject. The 
important point here is that good test construc- 
tion is absolutely essential to the success of 
instructional materials development. The entire 
developmental process can stand or fall on the 
quality of the critet'ion tests. 

Implementing Criterion-Referenced Measurement 

To summarize this module on criterion-referenced 
testing, theri are three essential steps to 
remember in implementing criterion-referenced 
measurement. They are: 

1. Prior to instruction, prepare a set of 
instructional objectives for the unit 
of instruction. 

2. Select appropriate assessment approaches/ 
techniques to assess students' mastery of 
the stated objectives. The appropriateness 
of the technique selected will largely 

be determined by the nature of the skill 
or competency specified in the objectives. 

3. Match particular assessment techniques 

6^ 
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Content Outline (continued) 



selected with the performance behaviors 
delineated in the instructional objectives, 
It is important that the specific test 
item correspond to the level and type of 
behavior specified in the objective. 
Perhaps the single most effective method 
of ensuring precision and accuracy in 
criterion-referenced measurement is the 
careful matching of test items to 
performance objectives (14). 



Wrapup of Module 



★ ★ 



(14) Objectives for 
Instruction and 
Evaluation , 
p. 118. 



*See Classroom 
Activity 4 in 
Part III. 

*See Discussion 
Question F in 
Part III. 
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study Activities 

Based on yoiu' reading of the content 
as suggested, complete the following 



outline and any additional vefeveyu-^^s 



Kead pp. 40-43 ("Criterion Examination") cf Mager and Beach, Developin ia 
Vocational I nstruction . Then complete the following activity. 

For each of the objectives provided in this exercise, there is a list of 
possible test items. Indicate by writing "yes" or "no" whether or not 
that test item is approoriate for assessing the objective. 



a, 



OBJECTIVE: When approached by a prospective customer, respond in a 

a positive manner- (with a smile, a suitable greeting, and 
pleasai't tone of voice). 

a. Describe tne three ^asi: characteristics of a positive 
response to the app.oach of a prospective customer. 

b. Look at the lol lowing ten photographs and write the num- 
ber of those that represent a correct response to the ap- 
proach of a prospective customer. 

c. Watch the following ten film clips and write down the 
number of those that represent a correct response to the 
approach of a prospective customer. 

d. When the instructor hangs the "customer" sign around 
his neck and approaches you, make the correct response 
to Che approach of a prospective customer. 

e. Write a paragraph describing the importance of each 
element of the response to customer approach. 

f. When approached by each of five students selected by 
the instructor, make the appropriate response to cus- 
tomer approach. 
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OBJECTIVE: Be able to type a business letter in accordance with 
standards described in Company Manual 1^-21. 

a. Describe the five basic elements of a business lr';ter. 

^b. Sort the ten sample letters into piles representing 

those that are written in accordance with Company 
standards and those that are not. 

^c. On the five cample letters given, circle any errors or 

items not in accordance with. Company standards. 

d. Describe in a paragraph the rationale for the business 
letter standards currently in effect. 

^e. From the rough copy given, type a business letter in 

the form set out by Manual 12-21. 

f. Tell how you would in:.truct a secretary in the prepara- 
tion of business letters according to current policy. 

OBJECTIVE: Be able to read a domestic electric power meter correct- 
ly to the near'est unit and record it on the appropriate 
page of the Meter-Reader's log. 

__a Define ki lowaLt-hour . 

b. Of the five dials on the domestic meter, which records 
"thousands of units"? 

c. Look at this picture- of a dial. What is the reading? 

d. Look at the dials on these domestic meters. What are 
the readings? 

e. Record on the appropriate page of your log the readings 
of each of these ten domestic meters. 
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d. OBJECTIVE: Be able to construct a parallelogram. 



a. Define parallelogram. 



b. Describe the difference between a parallelogram and a 
rectangle. 

c. Look at the following figures and draw a circle around 
the parallelograms. 

d. Draw a parallelogram whose sides are 1" and 3" in length. 



Read the Weber and Lucas article, "Evaluating Student Progress," in 
The Individual and His Education. Then complete the activity below. 



As a final activity in Module 8, Development of Instructional Materials , 
you developed a lesson plan for a unit of instruction. (See Goal 8,2, 
p. 69, in the Study Guide for that module.) Now you will have the 
opportunity to develop an evaluation plan for assessing achievement 
of the objectives for that unit of instruction. 

Using the Weber and Lucas article as a guide, develop a Table of Specifi- 
cations for your unit of instruction. This table will serve as a plan 
for test development. Indicate the content of your unit of instruction 
dnd the various dimensions of your instructional objective:.. 

When you have completed the Table of Specifications, look at it to sne 
if it represents a balanced evaluation scheme. Then answer the follow- 
ing questions, 

a. Doer, your Table of Specifications represent a balanced evaluation 
scheme? If not, how wou.J you explain thi:/. Perhaps there is a 
(jood reason, 
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b. If your Table of Specifications is not balanced and there doesn't 
seem to be good reason for this, how would you change the content 
and objectives for your unit of instruction in order to create a 
more balanced unit? 

Having read the Ueber and Lucas article, "Evaluating Student Progress," 
and having constructed an evaluation plan for your unit of instruction, 
you will now have an opportunity to actually construct the test instru- 
ments for the unit. 

Develop written tests, performance tests, and tests, to measure attitudes, 
as appropriate. The important concern is to develop test items that are 
appropriate for assessing achievement of your objecti ves. 

Many texts on principles of test item construction exist and several are 
mentioned in the Weber and Lucas article. Feel free to use whatever of 
these reference materials you find necessary to develop test items. 
(Adequate space is not provided in this guide, so use additional sheets 
of paper as necessary.) 
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Part III: 



Group and Classroom Activities 



PART III 

GROUP AND CLASSROOM ACTIVITIES 



Classroom Activities 

NOTE: The following activities are designed to stimulate discussion in 
the classroom on specific topics covered in this module. The activities 
are designed to be used after student self-study, however, depending on 
the background and abilities of students, these activities may not 
require previous self-study. All classroom activities are keyed to the 
Content Outline to indicate an appropriate point for participation. 

1. Debate the following issue: 

Norbert Wiener noted that the human brain is able to handle value 
ideas— ideas that are not quantifiable and that any computer would 
have to reject as formless. (Norbert Wiener, God and Golem , jnc_. 
Cambridge, Mass.: The M.I.T. Press, 1964, p. 73.) Yet some pro- 
ponents of behavioral objectives contend that the teacher is not 
engaged in instruction when dealing with objectives that are not 
describable in terms that car be quantifiably measured. (W. James 
Popham and Eva L. Baker, Systematic Instruction . Englewood Cliffs, 
New Jersey: Prentice-Hall, Inc., 1970, p. 141.) (26) 

Students should divide into two teams, one team representing the 
Wiener point of view, and one team representing the Popham and Baker 
point of view on this issue of measuring learning outcomes in 
connection with instructional objectives. If there are students in 
the class that do not feel strongly about either point of view, they 
shouid fonii a third team representing a point rf view midway between 
Iho oxtromos. 
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Discuss th.p following issue: A contemporary educational critic 
contends that existing practice "makes it clear to students that the 
purpose of testing is not evaluation but rating--to produce grades 
that enable the school to rank students and sort them in various ways 
for administrative purposes. The result is to destroy any interest in 
learning. . ." (Charles E. Silberman, Crisis in the Classroom . New 
York: Random House, 1970, p. 348.) Do you agree with this statement? 
Explain the reasons for your point of view. (26) 

Students should provide examples from any vocational courses they have 
taken that support their point of view on this issue. 

Students should group themselves into two teams to debate the issue of 
whether or not instructors or curriculum specialists can more effec- 
tively acquire a set of measurable and appropriate objectives with 
corresponding test items by generating their own or by selecting them 
from other sources . 

Consider such practical matters as: 

Are other sources of measurable objectives with cjrresponding test 
items available? If so, are thes- .s-*ectives and test items suit- 
able for the local situation? 

Are instructors or curriculum snc-r-'^M. » likely to have time to 
generate their own objectives ■ U^':. tems? If not, is it pos- 
sible that a local curriculum 1^':^^ .'.^'i- effectively function to 
develop object ivos and tosL i l -.i^? 

As a wrapup activity for this n d, select any learn ng module 
(teacher-made or coiriinercial ) tha-' may be available i'l t*v^ classroom. 
Students should analyz-^ tho module for mtcj) of ob^ct.t ' vcs .ind test 
'V'lii'., Based on how well nbjocti .es and test i tern? (V^^ch, students 
should determine whettier or not they would recon-mend the. module for 
further uso. 
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Activities for Additional Credit 

NOTE: These activities a^-e designed for ci.e student who wishes to obtain 
additional credit beyond the basic requi rr^runts of this module. You may 
choose to write a paper on one of these ac"-'Vities, or discuss the activity 
with the instructor, or you may select some other method to complete the 
activity. 

1. Examine some teacher-made tests in our area of specialization and 
evaluate each item according to Lht cogr.-tive levels represented in 
Bloom's Taxonomy of Educational C.;j 3ctives . What cognitive levels 
receive the greatest emphasis? W.^at cognitive levels are totally 
neglected? Can you devise some Lst items that represent the cogni- 
tive levels of application, analysis, synthesis, and evaluation? 

2. Do the same for the affective c'j 'C.in as yoi: did in Item 1 above. 

3. In Silvius and Bohn, Planning and Crgar^.;"'i\j Instruction (22), the 
authors recomrienJ the following Loo', 'ro'^ Procedures to develop 
criteria for measuring student achicv'^.^iV.; of instructional objec- 
tives in the affective domain: Rob.?r. r. Mager, Goal Analysis 
(Belmont, California: Fedron Pub i ii,liers, 1972). Read this book-- 
it's another Mager shortie--ijnd ir."', q the procedures presented 
there, describe the speci. .t. rfo nuances that would indicate 
achievement of several ot tt>.: affective objectives from the unit of 
instruction you developer ^'or this group of modules. When you 
have listed these specific performances, develop criterion test 
items that indicate whether or not the student has achieved the 
affective objective. 

4. Visit two vocational classes in your area of specialization and de- 
scribe the types of meas.j-^' ment being used to assess student 
achievement. For what purposes are these types of measurement 
being used? Is the use appropriate for the purpose? What recom- 
mendations would yo'. make to improve the methods of assessing 
student achievement? 
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In Volume I of his Educational Psychology , E. L. Thorndike (1913) 
described the problems of judging student achievement in relation 
to other students' achievement. Cite arguments from Thorndike that 
would support today*s concept of "criterion-referenced" measurement 
as a means to assess student achievement. 

It is possiMe that someday, upon designation of instructional ob- 
jectives, r computer could devise a table of specifications (evalu- 
ation pUr/*, se.ect test items from a test-item pool, print the 
test, score and grade it, and analyze the results. What do you see 
as the possible advantages and disadvantages of such a criterion- 
referenced measurement system? 

By researching the literature, or by any other means of your choosing, 
locate an example of a testing program in any area of vocational ed- 
ucation that uses both norm-referenced and criterion-referenced 
tests. Then in a 3 to 5 page paper, summarize this program, high- 
lighting the appropriate and effective uses of norm-referenced 
measurement and criterion-referenced measurement. Use the Ocean View 
program described in the Special Report : The Nature and Uses of 
Criterion-Referenced and Norm-Referenced Achievement Tests (page 31 
of the Study Guide) as a guide for the preparation of your paper. 
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PART V 
APPENDICES 

Appendix A: 

Possible Study Activity Responses 



GOAL 9.1 

la. b 

b. a 

c. b 

d. b 

2a. a 

b. b 

c. b 

d. a 



3a. Psychological tests were administered to thousands of Army recruits 
during World War I. These tests, which employed the concept of the 
IQ or "mental ratio," caught on, the idea spread, and the form was set. 
When the war was over, the schema of the mental test, invented to dis- 
cover and predict aptitude, was remodeled for school use--not only for 
this purpose but also to test school achievement for diagnostic and 
training purposes. Standard'i zed subject-matter tests and test bat- 
teries multiplied. (20) 

b. Robert Glaser coined the term "criterion-referenced measurement" in a 
1963 article in the American Psychologist, called "Instructional Tech- 
nology and the Measurement of Learning Outcomes: Some Questions." 

c. Probably the programmed instruction movement of the early 1960s with 
its emphasis on behavioral objectives gave the greatest impetus to 
criterion-referenced measurement (which measures individual achieve- 
ment of objectives) . 

d. In large part, achievement measures currently employed in edpcation 
are norm-referenced. This emphasis upon norm-referenced measures 
has been brought about by the preoccupation of test theory with ap- 
titude, and with selection and prediction problems; norm-referenced 
measures are useful for this kind of work. However, the imposition 
of this kind of thinking on the purposes of achievement measurement 
ra'ises some quest iuns (20). 
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e. Glaser's primary concern regarding the measurement of learning out- 
comes is how to assess existing levels of competence and achieve- 
ment and the conditions that produce them. 

4a. It is in the technical sense that criterion-referenced tests are 
rather new. Only in the last few years have those concerned with 
the theoretical and technical issues of educational and psycholo- 
gical measurement undertaken any sustained large-scale effort to 
deal with criterion-referenced tests. It has been even more re- 
cently that publishers of tests for schools have tried to offer 
any substantial tests of this sort. Therefore, consensus among 
the experts about technical requirements for the tests has yet to 
be reached and little about these matters can be found in courses 
and textbooks. 

b. Other current terms in use for criterion-referenced tests include: 
"domain-referenced," "objectives-based." 

c. There is little unanimity of opinion about the meaning of the word 
"standardized" as a descriptor of tests. If "standardized" is de- 
fined as describing a test that has explicit instructions for ad- 
ministration and that was constructed to meet technical standards, 
then a criterion-referenced test can be considered standardized. 



5a. b 

b. a 

c. a ■ 

d. b 

e. a 

f. b 



6a . VARIABILITY 

CRM: Variability is irrelevant, it is not a necessary conditon for 
a good criterion-referenced test. 

NRM: Variability is essential; since the meaningfulness ofanorm- 
referenced score is basically dependent on the relativepositionof 
the score in comparison with other scores, the more variability in 
the scores the better. 

b. ITEM CONSTRUCTION 

CRM: Criterion-referenced item writers are guided by the goal of 
making sure the item is an accurate reflection of the criterion be- 
havior, 

NRM: Norm- referenced item writers are guided by the goal of devel- 
oping items that produce variability. 

7 9 

-72- 

O 

ERIC 



RELIABILITY 



CRM: Criterion-referenced tests should be reliable, that is, they 
should be internally consistent. However, it is not obvious how to 
assess the internal consistency; the classical procedures are not 
appropriate because they are dependent on score variability. Every 
student could obtain a perfect score on a criterion-referenced test, 
yet by classical standards this test would not be considered inter- 
nally consistent. 

NRM: Norm-referenced tests should be reliable, that is, they should 
be internally consistent. Classical procedures for assessing inter- 
nal consistency are appropriate for norm-referenced measurement. 

VALIDITY 

CRM: Criterion-referenced measures are validated primarily in terms 
of the adequacy with which they repr :!it the criterion. 

NRM: Many of the procedures for ass ,^itii the validity of norm- 
referenced tests are based on correlations and thus on variability. 

ITEM ANALYSIS 

CRM: For criterion-referenced tests, the use of discrimination in- 
dices (item analysis procedures that identify those items that do 
not properly discriminate among individuals taking the test) must 
be modified. An item that does not discriminate need not be eli- 
minated. 

NRM: Item analysis procedures have traditionally been used with 
norm- referenced tests to identify those items that do not properly 
discriminate among individuals taking the test. If an item does 
not properly discriminate between the more and less knowledgeable 
learners, the item should be eliminated. 

REPORTING AND INTERPRETATION 



CRM: When interpreting an individual's performance on a criterion- 
referenced test, group-relative indices are not appropriate. The 
individual has either mastered the criterion or he has not. In re- 
por'ting an individual's performance, one alternative is the use of 
an "on-off" approach; the student either has or has not achieved 
the criterion. 

NRM: In interpreting the results of an individual's performance 
on a norm- referenced test, the concern is with the individual's 
performance in relation to the performance of other individuals. 
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GOAL 9.2 

la. c 

b. a 

c. b 

d. b 

e. a 



2. (See following pages) 





Objective Tests 




Advantages 


Limitations 


1, Objectivity 


This the major 
advantage of ob- 
jective tests. 




I Validity • 


If nbiective test 
items are tied di- 
rectly to instruc- 
tional objectives, 
validity can be a 
major advantage of 
objective tests. 


If objective test 
items are not tied 
directly to in- 
structional objec- 
tives, then one 
can question the 
validity of these 
items. 


3. Reliability 
. - 


If test items are 
wei i-wri Lien aiiu 
precise, then reli 
ability can be a 
major advantage. 


If test items are 
vaauplv worded and 
open to interpre- 
tation, then they 
are not reliable. 



Subjective Tests 



Advantages 



Limitations 



If standards and 
criteria for as- 
sessment are spe- 
cified, then a sub 
jective test can 
be valid. 



This is the oiajor 
limitation of sub- 
jective tests. 



If standards and 
criteria for as- 
sessment are not 
specified, then one 
can question the 
validity of these 
items. 



Subjective tests 
often lack reliabil- 
ity. 



e 



Discussion Questions 



A. What is the difference between measurement and evaluation? 

(For many years educators in this country have tossed around the term 
"evaluation" with ahnost indifferent imprecision. For some, the 
expression referred exclusively to the grading operations wherein 
pupils were assigned A, B, C, etc. To others, it meant essentially 
the same as "measurement." Still others thought of evaluation as 
experiments to discover if Method A was bettor than Method B. Although 
each of these notions of educational evaluation has been subscribed to 
by many, each is clearly inconsistent with the conception of educational 
evaluation endorsed by most educational leaders today.) (19) 



B, What educational phenomena of the 1960s might have given impetus to 
crite. ion-referenced measurement? 

(Probably the programmed instruction movement of the early 1960s with 
its emphasis on measurable instructional objectives gave the greatest 
impetus to criterion-referenced measurement--which measures student 
achievement of instructional objectives.) 



C. What advantages, limitations, and/or dangers do you see in the position 
^ that teachers must specify and measure all instructional objectives? 

(This is a matter of great debate in educational circles. Proponents 
of instructional objectives hold that the only sensible reason for the 
educator's engaging in instruction is to modify the learner's behavior; 
therefore, these intender^ changes must be described in terms of 
measurabls learr.er jehaviors. On the ot>2r hand, educators who do not 
feel the need to specify all learner behavior in terms of instructional 
objectives believe that the problem with excessive insistence on build- 
ing specificat-'cns for each and every instructional objective is that 
human beings are not built like automobiles or washing machines. The 
consequence of such detailed specifications in education, they feel, 
is that achievement comes to denote the sort of thing that a well- 
planned machine can do better than a human being can, and the main 
effect of education, the achieving of a life of rich significance, 
drops by tf^e wayside.) (26) 

D. Many educators acknowledge that affective objectives and their 
measurement have not received adequate emphasis in the curriculu-rn, 
particularly the vocational curriculum. How do you account for tn'is? 

(Some of the major reasons for neglect of affective objectives in 
education include: 

1. The failure of teachers and curriculum specie' ists to appreciate 
I fully the necessary interrelationships and interdependence between 

^ affective and cognitive objectives; 
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2. th.^ concentration on cognitive learnings in the curriculum reform 
projects of the 1950s and 1960s; 

3. the emphasis on cognitive learnings— particularly those at the 
lower levels of the taxonomy--in traditional schooling, coupled 
with the lonq-tield concept of mind as an entity separate from the 
emotions, along with tne persistent, time-worn belief that the 
mind is best strengthened through rigorous intellectual exercise; 

4. the new nmphasis on operant conditioning and the treatiiient of the 
learner as an automatic mechanism; 

5. the enormous difficulvy inherent in teaching for and evaluating 
affective learning in conjunction with cognit.>/e learning; 

6. the controversies atca,:ted to evaluating attitudes, feelings, 
emotions, and values.) (26) 

Discuss the relation of performr.nce tests to written tests. When 
might a written test be considered a performance test? Muny -'instructors 
speak of written tests and perforJiance tests as if they were separate 
and distinct types. Written tests are often considered poor measures 
of proficiency, while performance tests are thought to constitute the 
only real measures of performance. Is this necessarily true? 

(A performance test is a test which requires a student to accomplish 
a job-like task under cor iroTed conditions. Beca"c-. some written 
tests a'-e indeed -'ob-like, they too can be considerjd performance 
tests. On the other hand, iust because a t.^t itp.n involves equipment 
and requires the student to peiform something does not mean that it 
is job'-elated. Also, the fact that performance is involved does not 
assure accurate measurement of ability. The real requirement is that 
t\ 2 tesv; situation make demands of the student that are as similar as 
possible to tnose of thz job. It is what is measured that counts in a 
performance test--not the procedure by which it is measured.) (5) 



Do you have any problems or concerns regarding this module? 
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Student Self-Cfieck 
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PART IV 
STUDENT SE.r-CHECK 



P art A; Knowledge As s essmen t 
GOAL 9 J 

1. What are the definitions of the following items? (9.11) 

a. educational eval 'jaii o*^. : 

b. educational measuremei^t : 

c. cri tenon-rpferenced tcst.nq: 

d. norm-referenced Letting: 



2. What factors in American education contributed to an increasing 
emphasis on cri terion-referenceo measurement? (9.1?) 

3. Which type of test has as its major pui ;ose the determination of the 
student's relative position withii a group of students? ;9.13) 
^a. cri terion-referenccc" ''.est 

^b. norm-referenced test 

4. Which type of test consists of tes" items that are conit-ucted to 
measure a } redeterm- ned level o^ pro-^iciency? (9.13) 

a. criterion-referenced lest 

b. norm-referenced test 

5. State the difference between, norm-referenced measurement -nd ^ .^iterion- 
referenced measurement on the ba^is of: variability, rel iabi 1 ty , and 
validity. (9.14) 

a. Variability: 

b. Reliability: 

c. Validity: 
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GOAL 9.2 

6. As part of a performance test, an instructor observes a student to 
determine whether or not he is able to clean and make a hospital bed, 
following correct hospital procedure. What learning domain is this 
technique primarily assessing? (9.21) 

a. psychomotor domain 

b. cognitive domain 

_c. affective domain 

7. What are the two basic types of test questions used in paper-and- 
pencil tests? (9.22) 

8. State the differences between objective tests and subjective tests on 
the bases of: reliability and comprehensiveness. (9.22) 

a. Reliability: 

b. Comprehensiveness : 

9. What is a "performance test"? (9.23) 
GOAL 9.3 

10. Which of the following test items is appropriate for assessing this 
objective: Be able to recognize when a torch flame is appropriate 
for cutting half-inch steel? (9.31) 

a. Describe the characteristics of a torch flame that is appro- 
priate for cutting half-inch steel. 

b. Look at the following eight color slides of good and bad 
flames and write the number of those appropriate for cutting 
half-inch steel . 

c. Given a welding torch, adjust the flame until it is appro- 
priate for cutting half-inch steel. 

d. Tell how you would adjust the flame of a welding torch to 
make it appropriate for cutting half-inch steel. 
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Part B; Performance Assessment 

The purpose of this part of the test is to assess your ability to perform 
some of the actual steps involved in the construction of criterion- 
referenced test instruments. You should complete it outside of class and 
use any reference materials that may be helpful. Be sv^e to have the 
materials you developed for the P -r formance Assessment i^ortions of the 
Module 7 and Module 8 Sel f-Checki- . You will now have a chance to build on 
the materials you completed there. 

This test consists of completing each of the following items in order. As 
you finish each item, check it off and continue to the next. Tf you find 
any of the forms suggested in the Study Guide helpful in completing these 
steps, use them. Otherwise, you may use your own particular forms, as long 
as you complete each step below as indicated. 

1. Select approaches/techniques for assessing the objectives of the 

two units of instruction for which you developed lesson plans in 
the Module 8 Performance Assessment. This should be a general 
listing of several possible approaches to assessing the objectives. 
Later, you will select one of these approaches and construct test 
instruments following the approach you selected. (9.24) 

2. In the Module 8 Performance Assessment, you developed lesson plans 

for two units of instruction: one unit primarily in the cognitive 
or affective domain, and the other unit in the psychomotor domain. 
Now you are to develop an evaluation plan for assessing achieve- 
ment of the objective; for those units of instruction. Using the 
Weber and Lucas article i"' Th9_ Indi vidua l and His Education as an 
aid, develop a Table of Specification: for each unit of instruction. 
These tables will serve as a pla^" for test development. Indicate 
the content of your instru-~tional units and the various dimensions 
cf your instructional objectives. (9.32) 

3. From the list possible dpproaches/techniques for assessing 

objectives that you developed for Item 1, select a final approach 
and construct criterion-referenced test instruments for your two 
units of instruction. Develop written tests, perfcrmance tests, and 
tests to measure attitudes, .''S appropriate. The important concern 
is to develop test items that m atch the specifications of your 
objectives. (9.33) "69- 



CRITERION 


Objective Tests 


Subjective Tests 




Advantages 


Limitations 


Advantages 


Limitations 


4. Comprehen- 


Ojective test i- 
tems permit a wide 
sampling of know- 
ledge, therefore 
comprehensiveness 
is a major advan- 
tage. 






This is another 
major limitation 
of subjective 
tests. 


siveness 






5. Convenience 


Objective tests 

are easy to admin- 
ister ai;d score. 


Writing objective 

tests is time- 
consuming. 


Writing subjective 
tests takes a rela- 
tively short time. 


Subjective tests ' 
are time-consum- 
ing to score. 
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(The specific response to this activity depends on the particular ob- 
jectives you specified in Module 7. In general , the aPro^ches/tech- 
niques you select for measuring achievement of these objectives should 
be appropriate for assessing the specific behavior described by the ob- 
jectives. For example, if an objective specifies that a student should 
be able to demonstrate some job-related task, the assessment approach 
should require the student to actually demonstrate that task, not to 
take a written multiple-choice test.) 



GOAL 9.3 

la. 



a . 


no 


D . 


no 


C . 


no 


0 . 


yes 


e. 


no 


f. 


yes 


a . 


no 


D . 


no 


C . 


no 


d. 


no 


e. 


yes 


f. 


no 


a. 


no 


b. 


no 


c . 


no 


d. 


no 


e. 


yes 


a. 


no 


b. 


no 


c. 


no 


d. 


yes 



2. (The specific response to this activity depends on the particular objec- 
tives you specified for your particular unit of instruction If your 
Table "of Specifications does not represent a balanced evaluation scheme, 
there may be good reason. Perhaps the subject of your course en;phasizes 
objectives of one type. For example, a course in retail arithmetic em- 
phasizes application.) 
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(The specific response to this activity depends on che particular ob- 
jectives you developed for your particular unit of instruction. The 
portant concern is that your test items assess the behavior specified 
by the objectives. Check with your instructor.) 




Appendix B: 

Possible SelfXheck Responses 



Part A: Knowledge Assessment 
GOAL 9.1 

1. Mhat are the definitions of the following tenns? (9.11) 

a. educational evaluation: 

The detennination of the worth of educational phenomena; 
the tenn generally refers to the evaluation of an educa- 
tional enterprise, such as an instructional sequence, not 
to the evaluation of students within that enterprise. 

b. educational measurement: 

The assessment of the current status of an educational 
phenomenon in a precise fashion--that is, counting or 
enumerating so that the phenomenon can be more accurately 
described--wi thout placing value (goodness or badness) on 
the phenomenon thus described. 

c. criterion-referenced testing: 

A form of educational measurement that ascertains an 
individual's status with respect to some criterion or per- 
formance standard. 

d. norm-referenced testing: 

A form of educational measurement that ascertains an indi- 
vidual 's performance in relationship to the performance of 
other individuals on the same measuring device. 
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What factors in American education contributed to an increasing 
emphasis on criterion-referenced measurement? (9.12) 

Probably the programmed instruction movement of the 
early 1960s with its emphasis on behavioral objectives 
gave the greatest impetus to criterion-referenced measure- 
ment. 

The appearance of Robert Glaser*s 1963 article in the 
American Psychologist , "Instructional Technology and 
the Measurement of Learning Outcomes: Some Questions," 
in which the term "criterion-referenced" measurement 
appeared for the first time, also drew attention to CRM, 



Which type of test has as its major purpose the determination of 
the student's relative position within a group of students? (9.13) 

a. criterion-referenced test 



X b. nonn-referenced test 



Which type of tesr consists of test items that are constructed to 
measure a predetermined level of proficiency? (9.13) 

X a. criterion-referenced test 



b. norm-referenced test 



State the difference between norm-referenced measurement and crite- 
rion-referenced measurement on the basis of: variability, reliabili 
and validity. (9.14) 

a. Variability: 

With norm-referenced measurement, the more variability in 
the test scores the better, since tlie meaningfulness of 
the score is basically dependent on the relative position 
of the score in comparison with other scores. 
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Variability is not a necessary condition for a good crite- 

rion-reforenced test; in fact, variability is irrelevant 

since the meaninq of the score is not dependent on comparison 
wi t.h other scores , 

b. Reliability: 

A nom-referenced test is reliable when all the items in it 
"measure the same thing" to some minimal extent, that is, 
when the test is internally consisterit. Classical procedures 
to assess internal consistency are dependent on score 
variability and thus are appropriate only for norm-referenced 
tests . 

Obviously, criterion-referenced tec^s should be internally 
consistent, but it is not obvious how to assess this. The 
classical procedures are not appropriate, and indices 
assess internal consistency of criterion-referenced tests 
have not yet been developed. 

c. Validity: 

Tor norm-referenced tests, \ , ^^sessment of their validity 
is based on correlations and thus on variability. 

For cri teri un-ref erenced tests, the assessment of their 
validity is based primarily on the adequacy with which 
they represent the criteria. 



GOAL 9.2 

6. As part of a perf orii-ance test, an instructor observes a student to 
dotonriine whether or not he is able to clean and make a hospital 
bed, following correct liospital procedure. What learning domain 
is ^hi". tochnuiue primarily assessing? (9.21) 

X a . [}syv:nO[riOtor doina i n 

b. cogni :i doniain 

c. affective domain 



What are the two basic types of test questions used in paper-and- 
pencil tests? (9.22) 

1. objective test questions 

2. subjective test questions 



State the differences between objective tests and subjective tests 
on the bases of: reliability and comprehensiveness. (9.22) 

a. Reliability: 

Reliability is the greatest advantage of objective tests 
since test scoring is void of teacher bias. 

In most cases, the reliability of subjective tests 
very low because scoring depends on the individual teacher 
evaluating the test, thereby opening the door for indi- 
vidual bias or prejudice. 

b. Comprehensiveness: 

Since objective t-sts permit a wide sampling of knowledge, 
comprehens-iveness is a major advantage. With a test item 
for every objective, a cnterion- referenced objective 
test will necessarily give comprehensive coverage of all 
desired behaviors. 

ComprehensiveneL^ is a major limitation of subjective tests 



What is a "performence test"? (3.23) 

A performance test is a test that evaluates, under realistic 
conditions, the performance or tasks that have value in some 
life situation. 
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GOAL 9.3 

10. Which of the following test items is .ppropriete for assessing this 
objective: Be able to recognize when a torch flame is appropriate 
for cutting half-inch steel? (9.31) 

a. Describe the characteristics of a torch flame that is 

appropriate for cutting half-inch steel. 
_"X_b. Look at the following eight color slides of good and bad 
flames and write the number of those appropriate for 
cutting half-inch steel. 
c. Given a welding torch, adjust the flame until it is appro- 
priate for cutting half-inch steel, 
d. Tell how you v/ould adjust the flame of a welding torch to 
make it appropriate for cutting half-inch steel. 

(NOTE: Although the actual flame is probably more relevant for testing 
than color slides, it is less practical for discrimination training since 
it would take the instructor considerable time to misadjust a flame to 
present the student with a predesigned array of stimuli.) 

Part B: Performance Assessment 

In scoring Part B, you should be primarily concerned with the techniques 
and processes used to construct test instruments and with the appropriateness 
of test items for assessing achievement of specific objectives. Your 
personal judgment will be a major factor in scoring Part B. However, for 
the test items indicated below, assessment should consider specific factors: 

Item 1. The approaches/techniques selected for measuring achievement of 
the objectives should be appropriate for assessing the specific behavior 
described by the objectives. For example, if an objective specifies 
that a student should be able to demonstrate some job-related task, the 
assessment approach used should require the student to actually demon- 
strate that task, not to take a written multiple-choice test- 

Item 2. Because a Table of Specifications will be developed for instruc- 
tional units primarily representing one learning domain (cognitive, 
affective, or psychomotor), the evaluation plan will emphasize test items 
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in that particular domain. This emphasis is reasonable, and o.ie should 
not be downgraded for having an "unbalanced" plan. 



jtem 3. Ti.e test items developed should match the specifications of the 
objectives, that is, the specific behavior described by the objective 
should :)e the specific behavior that the test item assesses. 
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